1. 21 Dec, 2020 12 commits
    • Masahiro Yamada's avatar
      modpost: change license incompatibility to error() from fatal() · d6d692fa
      Masahiro Yamada authored
      Change fatal() to error() to continue running to report more possible
      issues.
      
      There is no difference in the fact that modpost will fail anyway.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      d6d692fa
    • Masahiro Yamada's avatar
      modpost: turn missing MODULE_LICENSE() into error · 1d6cd392
      Masahiro Yamada authored
      Do not create modules with no license tag.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1d6cd392
    • Masahiro Yamada's avatar
      modpost: refactor error handling and clarify error/fatal difference · 0fd3fbad
      Masahiro Yamada authored
      We have 3 log functions. fatal() is special because it lets modpost bail
      out immediately. The difference between warn() and error() is the only
      prefix parts ("WARNING:" vs "ERROR:").
      
      In my understanding, the expected handling of error() is to propagate
      the return code of the function to the exit code of modpost, as
      check_exports() etc. already does. This is a good manner in general
      because we should display as many error messages as possible in a
      single run of modpost.
      
      What is annoying about fatal() is that it kills modpost at the first
      error. People would need to run Kbuild again and again until they fix
      all errors.
      
      But, unfortunately, people tend to do:
      "This case should not be allowed. Let's replace warn() with fatal()."
      
      One of the reasons is probably it is tedious to manually hoist the error
      code to the main() function.
      
      This commit refactors error() so any single call for it automatically
      makes modpost return the error code.
      
      I also added comments in modpost.h for warn(), error(), and fatal().
      
      Please use fatal() only when you have a strong reason to do so.
      For example:
      
        - Memory shortage (i.e. malloc() etc. has failed)
        - The ELF file is broken, and there is no point to continue parsing
        - Something really odd has happened
      
      For general coding errors, please use error().
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarQuentin Perret <qperret@google.com>
      0fd3fbad
    • Masahiro Yamada's avatar
      modpost: rename merror() to error() · bc72d723
      Masahiro Yamada authored
      The log function names, warn(), merror(), fatal() are inconsistent.
      
      Commit 2a116659 ("kbuild: distinguish between errors and warnings
      in modpost") intentionally chose merror() to avoid the conflict with
      the library function error(). See man page of error(3).
      
      But, we are already causing the conflict with warn() because it is also
      a library function. See man page of warn(3). err() would be a problem
      for the same reason.
      
      The common technique to work around name conflicts is to use macros.
      For example:
      
          /* in a header */
          #define error(fmt, ...)  __error(fmt, ##__VA_ARGS__)
          #define warn(fmt, ...)   __warn(fmt, ##__VA_ARGS__)
      
          /* function definition */
          void __error(const char *fmt, ...)
          {
                  <our implementation>
          }
      
          void __warn(const char *fmt, ...)
          {
                  <our implementation>
          }
      
      In this way, we can implement our own warn() and error(), still we can
      include <error.h> and <err.h> with no problem.
      
      And, commit 93c95e52 ("modpost: rework and consolidate logging
      interface") already did that.
      
      Since the log functions are all macros, we can use error() without
      causing "conflicting types" errors.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      bc72d723
    • Dominique Martinet's avatar
      kbuild: don't hardcode depmod path · 436e980e
      Dominique Martinet authored
      depmod is not guaranteed to be in /sbin, just let make look for
      it in the path like all the other invoked programs
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      436e980e
    • Masahiro Yamada's avatar
      kbuild: doc: document subdir-y syntax · c0ea806f
      Masahiro Yamada authored
      There is no explanation about subdir-y.
      
      Let's document it.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      c0ea806f
    • Masahiro Yamada's avatar
      kbuild: doc: clarify the difference between extra-y and always-y · d0e628cd
      Masahiro Yamada authored
      The difference between extra-y and always-y is obscure.
      
      Basically, Kbuild builds targets listed in extra-y and always-y in
      visited Makefiles without relying on any dependency.
      
      The difference is that extra-y is used to list the targets needed for
      vmlinux whereas always-y is used to list the targets that must be always
      built irrespective of final targets.
      
      Kbuild skips extra-y when it is building only modules (i.e.
      'make modules'). This is the long-standing behavior since extra-y was
      introduced in 2003, and it is explained in that commit log [1].
      
      For clarification, this is the extra-y vs always-y table:
      
                        extra-y    always-y
        'make'             y          y
        'make vmlinux'     y          y
        'make modules'     n          y
      
      Kbuild skips extra-y also when building external modules since obviously
      it never builds vmlinux.
      
      Unfortunately, extra-y is wrongly used in many places of upstream code,
      and even in external modules.
      
      Using extra-y in external module Makefiles is wrong. What you should
      use is probably always-y or 'targets'.
      
      The current documentation for extra-y is misleading. I rewrote it, and
      moved it to the section 3.7.
      
      always-y is not documented anywhere. I added.
      
      [1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=f94e5fd7e5d09a56a60670a9bb211a791654bba8Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      d0e628cd
    • Masahiro Yamada's avatar
      kbuild: doc: split if_changed explanation to a separate section · 39bb232a
      Masahiro Yamada authored
      The if_changed macro is currently explained in the section
      "Commands useful for building a boot image", but the use of
      if_changed is not limited to the boot image.
      
      It is often used together with custom rules. Let's split it as a
      separate section, and insert it after the "Custom Rules" section.
      
      I slightly reworded the explanation, re-numbered to fill the <deleted>
      section, and also fixed the broken indentation of the Note: part.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      39bb232a
    • Masahiro Yamada's avatar
      kbuild: doc: merge 'Special Rules' and 'Custom kbuild commands' sections · 41cac083
      Masahiro Yamada authored
      The two sections "3.10 Special Rules" and "7.8 Custom kbuild commands"
      are related because you must understand both of them when you write
      custom rules.
      
      Actually I do not understand the policy about what to go into
      "3 The kbuild files" and what into "7 Architecture Makefile".
      
      This commit reworks the custom rule explanation as follows:
      
       - Merged "7.8 Custom kbuild commands" into "3.10 Special Rules".
      
       - Reword "Special Rules" to "Custom Rules" for consistency.
      
       - Update the example for kecho because the blackfin Makefile
         does not exist any more.
      
       - Replace the example for cmd_<command> with a simpler one.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      41cac083
    • Masahiro Yamada's avatar
      kbuild: doc: fix 'List directories to visit when descending' section · 23b53061
      Masahiro Yamada authored
      Fix stale information:
      
       - Fix the section number in the reference from 6.4 to 7.4.
      
       - Remove init-y and net-y. They were removed by commit 23febe37
         ("kbuild: merge init-y into core-y") and commit 95fb6317
         ("kbuild: merge net-y and virt-y into drivers-y"), respectively.
      
       - Update the example because arch/sparc64/Makefile does not exit.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      23b53061
    • Masahiro Yamada's avatar
      kbuild: doc: replace arch/$(ARCH)/ with arch/$(SRCARCH)/ · 8c4d9b14
      Masahiro Yamada authored
      Precisely speaking, the arch directory is specified by $(SRCARCH),
      not $(ARCH).
      
      In old days, $(ARCH) actually matched to the arch directory because
      32-bit and 64-bit were supported as separate architectures.
      
      Most architectures (except arm/arm64) were unified like follows:
      
          arch/i386, arch/x86_64    ->  arch/x86
          arch/sh, arch/sh64        ->  arch/sh
          arch/sparc, arch/sparc64  ->  arch/sparc
      
      To not break the user interface, commit 6752ed90 ("Kbuild: allow
      arch/xxx to use a different source path") introduced SRCARCH to point
      to the arch directory, still allowing to pass in the former ARCH=i386
      or ARCH=x86_64.
      
      Update the documents for preciseness, and add the explanation of SRCARCH.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      8c4d9b14
    • Masahiro Yamada's avatar
      kbuild: doc: update the description about kbuild Makefiles · b044a535
      Masahiro Yamada authored
      This line was written in 2003. Now we have much more Makefiles.
      
      The number of Makefiles is not important. The point is we have a
      Makefile in (almost) every directory.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      b044a535
  2. 08 Dec, 2020 2 commits
  3. 06 Dec, 2020 26 commits
    • Linus Torvalds's avatar
      Linux 5.10-rc7 · 0477e928
      Linus Torvalds authored
      0477e928
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · ab91292c
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small driver fixes, and one "large" revert, for
        5.10-rc7.
      
        They include:
      
         - revert mei patch from 5.10-rc1 that was using a reserved userspace
           value. It will be resubmitted once the proper id has been assigned
           by the virtio people.
      
         - habanalabs fixes found by the fall-through audit from Gustavo
      
         - speakup driver fixes for reported issues
      
         - fpga config build fix for reported issue.
      
        All of these except the revert have been in linux-next with no
        reported issues. The revert is "clean" and just removes a
        previously-added driver, so no real issue there"
      
      * tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Revert "mei: virtio: virtualization frontend driver"
        fpga: Specify HAS_IOMEM dependency for FPGA_DFL
        habanalabs: put devices before driver removal
        habanalabs: free host huge va_range if not used
        speakup: Reject setting the speakup line discipline outside of speakup
      ab91292c
    • Linus Torvalds's avatar
      Merge tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · d49248eb
      Linus Torvalds authored
      Pull tty fixes from Greg KH:
       "Here are two tty core fixes for 5.10-rc7.
      
        They resolve some reported locking issues in the tty core. While they
        have not been in a released linux-next yet, they have passed all of
        the 0-day bot testing as well as the submitter's testing"
      
      * tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: Fix ->session locking
        tty: Fix ->pgrp locking in tiocspgrp()
      d49248eb
    • Linus Torvalds's avatar
      Merge tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · f5226f1d
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 5.10-rc7 that resolve a number of
        reported issues, and add some new device ids.
      
        Nothing major here, but these solve some problems that people were
        having with the 5.10-rc tree:
      
         - reverts for USB storage dma settings that broke working devices
      
         - thunderbolt use-after-free fix
      
         - cdns3 driver fixes
      
         - gadget driver userspace copy fix
      
         - new device ids
      
        All of these except for the reverts have been in linux-next with no
        reported issues. The reverts are "clean" and were tested by Hans, as
        well as passing the 0-day tests"
      
      * tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: gadget: f_fs: Use local copy of descriptors for userspace copy
        usb: ohci-omap: Fix descriptor conversion
        Revert "usb-storage: fix sdev->host->dma_dev"
        Revert "uas: fix sdev->host->dma_dev"
        Revert "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
        USB: serial: kl5kusb105: fix memleak on open
        USB: serial: ch341: sort device-id entries
        USB: serial: ch341: add new Product ID for CH341A
        USB: serial: option: fix Quectel BG96 matching
        usb: cdns3: core: fix goto label for error path
        usb: cdns3: gadget: clear trb->length as zero after preparing every trb
        usb: cdns3: Fix hardware based role switch
        USB: serial: option: add support for Thales Cinterion EXS82
        USB: serial: option: add Fibocom NL668 variants
        thunderbolt: Fix use-after-free in remove_unplugged_switch()
      f5226f1d
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8100a580
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of fixes for x86:
      
         - Make the AMD L3 QoS code and data priorization enable/disable
           mechanism work correctly.
      
           The control bit was only set/cleared on one of the CPUs in a L3
           domain, but it has to be modified on all CPUs in the domain. The
           initial documentation was not clear about this, but the updated one
           from Oct 2020 spells it out.
      
         - Fix an off by one in the UV platform detection code which causes
           the UV hubs to be identified wrongly.
      
           The chip revisions start at 1 not at 0.
      
         - Fix a long standing bug in the evaluation of prefixes in the
           uprobes code which fails to handle repeated prefixes properly.
      
           The aggregate size of the prefixes can be larger than the bytes
           array but the code blindly iterated over the aggregate size beyond
           the array boundary. Add a macro to handle this case properly and
           use it at the affected places"
      
      * tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
        x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
        x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
        x86/platform/uv: Fix UV4 hub revision adjustment
        x86/resctrl: Fix AMD L3 QOS CDP enable/disable
      8100a580
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9f6b28d4
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "Two fixes for performance monitoring on X86:
      
         - Add recursion protection to another callchain invoked from
           x86_pmu_stop() which can recurse back into x86_pmu_stop(). The
           first attempt to fix this missed this extra code path.
      
         - Use the already filtered status variable to check for PEBS counter
           overflow bits and not the unfiltered full status read from
           IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
           would be evaluated incorrectly"
      
      * tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Check PEBS status correctly
        perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
      9f6b28d4
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 592d9a08
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of updates for the interrupt subsystem:
      
         - Make multiqueue devices which use the managed interrupt affinity
           infrastructure work on PowerPC/Pseries. PowerPC does not use the
           generic infrastructure for setting up PCI/MSI interrupts and the
           multiqueue changes failed to update the legacy PCI/MSI
           infrastructure. Make this work by passing the affinity setup
           information down to the mapping and allocation functions.
      
         - Move Jason Cooper from MAINTAINERS to CREDITS as his mail is
           bouncing and he's not reachable. We hope all is well with him and
           say thanks for his work over the years"
      
      * tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        powerpc/pseries: Pass MSI affinity to irq_create_mapping()
        genirq/irqdomain: Add an irq_create_mapping_affinity() function
        MAINTAINERS: Move Jason Cooper to CREDITS
      592d9a08
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ff615c98
      Linus Torvalds authored
      Pull intel_idle build fix from Thomas Gleixner:
       "A tiny build fix for a recent change in the intel_idle driver which
        missed a CONFIG dependency and broke the build for certain
        configurations"
      
      * tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        intel_idle: Build fix
      ff615c98
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.10-2' of... · e6585a49
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Move -Wcast-align to W=3, which tends to be false-positive and there
         is no tree-wide solution.
      
       - Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
         preprocessor option and makes sense for .S files as well.
      
       - Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
      
       - Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
      
       - Fix undesirable line breaks in *.mod files.
      
      * tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: avoid split lines in .mod files
        kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
        kbuild: Hoist '--orphan-handling' into Kconfig
        Kbuild: do not emit debug info for assembly with LLVM_IAS=1
        kbuild: use -fmacro-prefix-map for .S sources
        Makefile.extrawarn: move -Wcast-align to W=3
      e6585a49
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 12c0ab66
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "12 patches.
      
        Subsystems affected by this patch series: mm (memcg, zsmalloc, swap,
        mailmap, selftests, pagecache, hugetlb, pagemap), lib, and coredump"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
        hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
        mm/filemap: add static for function __add_to_page_cache_locked
        userfaultfd: selftests: fix SIGSEGV if huge mmap fails
        tools/testing/selftests/vm: fix build error
        mailmap: add two more addresses of Uwe Kleine-König
        mm/swapfile: do not sleep with a spin lock held
        mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
        mm: list_lru: set shrinker map bit when child nr_items is not zero
        mm: memcg/slab: fix obj_cgroup_charge() return value handling
        coredump: fix core_pattern parse error
        zlib: export S390 symbols for zlib modules
      12c0ab66
    • Liu Zixian's avatar
      mm/mmap.c: fix mmap return value when vma is merged after call_mmap() · 309d08d9
      Liu Zixian authored
      On success, mmap should return the begin address of newly mapped area,
      but patch "mm: mmap: merge vma after call_mmap() if possible" set
      vm_start of newly merged vma to return value addr.  Users of mmap will
      get wrong address if vma is merged after call_mmap().  We fix this by
      moving the assignment to addr before merging vma.
      
      We have a driver which changes vm_flags, and this bug is found by our
      testcases.
      
      Fixes: d70cec89 ("mm: mmap: merge vma after call_mmap() if possible")
      Signed-off-by: default avatarLiu Zixian <liuzixian4@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Hongxiang Lou <louhongxiang@huawei.com>
      Cc: Hu Shiyuan <hushiyuan@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      309d08d9
    • Mike Kravetz's avatar
      hugetlb_cgroup: fix offline of hugetlb cgroup with reservations · 7a5bde37
      Mike Kravetz authored
      Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
      using hugetlbfs.  In this environment the issue is reproduced by:
      
       - Start a simple pod that uses the recently added HugePages medium
         feature (pod yaml attached)
      
       - Start a DPDK app. It doesn't need to run successfully (as in transfer
         packets) nor interact with real hardware. It seems just initializing
         the EAL layer (which handles hugepage reservation and locking) is
         enough to trigger the issue
      
       - Delete the Pod (or let it "Complete").
      
      This would result in a kworker thread going into a tight loop (top output):
      
         1425 root      20   0       0      0      0 R  99.7   0.0   5:22.45 kworker/28:7+cgroup_destroy
      
      'perf top -g' reports:
      
        -   63.28%     0.01%  [kernel]                    [k] worker_thread
           - 49.97% worker_thread
              - 52.64% process_one_work
                 - 62.08% css_killed_work_fn
                    - hugetlb_cgroup_css_offline
                         41.52% _raw_spin_lock
                       - 2.82% _cond_resched
                            rcu_all_qs
                         2.66% PageHuge
              - 0.57% schedule
                 - 0.57% __schedule
      
      We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
      Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
      infinitely spinning.  Little else can be done on the system as the
      cgroup_mutex can not be acquired.
      
      Do note that the issue can be reproduced by simply offlining a hugetlb
      cgroup containing pages with reservation counts.
      
      The loop in hugetlb_cgroup_css_offline is moving page counts from the
      cgroup being offlined to the parent cgroup.  This is done for each
      hstate, and is repeated until hugetlb_cgroup_have_usage returns false.
      The routine moving counts (hugetlb_cgroup_move_parent) is only moving
      'usage' counts.  The routine hugetlb_cgroup_have_usage is checking for
      both 'usage' and 'reservation' counts.  Discussion about what to do with
      reservation counts when reparenting was discussed here:
      
      https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/
      
      The decision was made to leave a zombie cgroup for with reservation
      counts.  Unfortunately, the code checking reservation counts was
      incorrectly added to hugetlb_cgroup_have_usage.
      
      To fix the issue, simply remove the check for reservation counts.  While
      fixing this issue, a related bug in hugetlb_cgroup_css_offline was
      noticed.  The hstate index is not reinitialized each time through the
      do-while loop.  Fix this as well.
      
      Fixes: 1adc4d41 ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
      Reported-by: default avatarAdrian Moreno <amorenoz@redhat.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarAdrian Moreno <amorenoz@redhat.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a5bde37
    • Alex Shi's avatar
      mm/filemap: add static for function __add_to_page_cache_locked · 3351b16a
      Alex Shi authored
        mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Link: https://lkml.kernel.org/r/1604661895-5495-1-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3351b16a
    • Axel Rasmussen's avatar
      userfaultfd: selftests: fix SIGSEGV if huge mmap fails · 573a2593
      Axel Rasmussen authored
      The error handling in hugetlb_allocate_area() was incorrect for the
      hugetlb_shared test case.
      
      Previously the behavior was:
      
      - mmap a hugetlb area
        - If this fails, set the pointer to NULL, and carry on
      - mmap an alias of the same hugetlb fd
        - If this fails, munmap the original area
      
      If the original mmap failed, it's likely the second one did too.  If
      both failed, we'd blindly try to munmap a NULL pointer, causing a
      SIGSEGV.  Instead, "goto fail" so we return before trying to mmap the
      alias.
      
      This issue can be hit "in real life" by forgetting to set
      /proc/sys/vm/nr_hugepages (leaving it at 0), and then trying to run the
      hugetlb_shared test.
      
      Another small improvement is, when the original mmap fails, don't just
      print "it failed": perror(), so we can see *why*.  :)
      Signed-off-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Alan Gilbert <dgilbert@redhat.com>
      Link: https://lkml.kernel.org/r/20201204203443.2714693-1-axelrasmussen@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      573a2593
    • Xingxing Su's avatar
      tools/testing/selftests/vm: fix build error · d8cbe8bf
      Xingxing Su authored
      Only x86 and PowerPC implement the pkey-xxx.h, and an error was reported
      when compiling protection_keys.c.
      
      Add a Arch judgment to compile "protection_keys" in the Makefile.
      
      If other arch implement this, add the arch name to the Makefile.
      eg:
          ifneq (,$(findstring $(ARCH),powerpc mips ... ))
      
      Following build errors:
      
          pkey-helpers.h:93:2: error: #error Architecture not supported
           #error Architecture not supported
          pkey-helpers.h:96:20: error: `PKEY_DISABLE_ACCESS' undeclared
           #define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
                              ^
          protection_keys.c:218:45: error: `PKEY_DISABLE_WRITE' undeclared
           pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
                                                      ^
      Signed-off-by: default avatarXingxing Su <suxingxing@loongson.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Link: https://lkml.kernel.org/r/1606826876-30656-1-git-send-email-suxingxing@loongson.cnSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8cbe8bf
    • Uwe Kleine-König's avatar
      mailmap: add two more addresses of Uwe Kleine-König · 4e60340c
      Uwe Kleine-König authored
      This fixes attribution for the commits (among others)
      
       - d4097456 ("video/framebuffer: move the probe func into
         .devinit.text in Blackfin LCD driver")
      
       - 0312e024 ("mfd: mc13xxx: Add support for mc34708")
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20201127213358.3440830-1-u.kleine-koenig@pengutronix.deSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e60340c
    • Qian Cai's avatar
      mm/swapfile: do not sleep with a spin lock held · b11a76b3
      Qian Cai authored
      We can't call kvfree() with a spin lock held, so defer it.  Fixes a
      might_sleep() runtime warning.
      
      Fixes: 873d7bcf ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
      Signed-off-by: default avatarQian Cai <qcai@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b11a76b3
    • Minchan Kim's avatar
      mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING · e91d8d78
      Minchan Kim authored
      While I was doing zram testing, I found sometimes decompression failed
      since the compression buffer was corrupted.  With investigation, I found
      below commit calls cond_resched unconditionally so it could make a
      problem in atomic context if the task is reschedule.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:108
        in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
        3 locks held by memhog/946:
         #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
         #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
         #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
        CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
          unmap_kernel_range_noflush+0x2eb/0x350
          unmap_kernel_range+0x14/0x30
          zs_unmap_object+0xd5/0xe0
          zram_bvec_rw.isra.0+0x38c/0x8e0
          zram_rw_page+0x90/0x101
          bdev_write_page+0x92/0xe0
          __swap_writepage+0x94/0x4a0
          pageout+0xe3/0x3a0
          shrink_page_list+0xb94/0xd60
          shrink_inactive_list+0x158/0x460
      
      We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
      contains the offending calling code) from zsmalloc.
      
      Even though this option showed some amount improvement(e.g., 30%) in
      some arm32 platforms, it has been headache to maintain since it have
      abused APIs[1](e.g., unmap_kernel_range in atomic context).
      
      Since we are approaching to deprecate 32bit machines and already made
      the config option available for only builtin build since v5.8, lastly it
      has been not default option in zsmalloc, it's time to drop the option
      for better maintenance.
      
      [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org
      
      Fixes: e47110e9 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Harish Sriram <harish@linux.ibm.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e91d8d78
    • Yang Shi's avatar
      mm: list_lru: set shrinker map bit when child nr_items is not zero · 8199be00
      Yang Shi authored
      When investigating a slab cache bloat problem, significant amount of
      negative dentry cache was seen, but confusingly they neither got shrunk
      by reclaimer (the host has very tight memory) nor be shrunk by dropping
      cache.  The vmcore shows there are over 14M negative dentry objects on
      lru, but tracing result shows they were even not scanned at all.
      
      Further investigation shows the memcg's vfs shrinker_map bit is not set.
      So the reclaimer or dropping cache just skip calling vfs shrinker.  So
      we have to reboot the hosts to get the memory back.
      
      I didn't manage to come up with a reproducer in test environment, and
      the problem can't be reproduced after rebooting.  But it seems there is
      race between shrinker map bit clear and reparenting by code inspection.
      The hypothesis is elaborated as below.
      
      The memcg hierarchy on our production environment looks like:
      
                      root
                     /    \
                system   user
      
      The main workloads are running under user slice's children, and it
      creates and removes memcg frequently.  So reparenting happens very often
      under user slice, but no task is under user slice directly.
      
      So with the frequent reparenting and tight memory pressure, the below
      hypothetical race condition may happen:
      
             CPU A                            CPU B
      reparent
          dst->nr_items == 0
                                       shrinker:
                                           total_objects == 0
          add src->nr_items to dst
          set_bit
                                           return SHRINK_EMPTY
                                           clear_bit
      child memcg offline
          replace child's kmemcg_id with
          parent's (in memcg_offline_kmem())
                                        list_lru_del() between shrinker runs
                                           see parent's kmemcg_id
                                           dec dst->nr_items
      reparent again
          dst->nr_items may go negative
          due to concurrent list_lru_del()
      
                                       The second run of shrinker:
                                           read nr_items without any
                                           synchronization, so it may
                                           see intermediate negative
                                           nr_items then total_objects
                                           may return 0 coincidently
      
                                           keep the bit cleared
          dst->nr_items != 0
          skip set_bit
          add scr->nr_item to dst
      
      After this point dst->nr_item may never go zero, so reparenting will not
      set shrinker_map bit anymore.  And since there is no task under user
      slice directly, so no new object will be added to its lru to set the
      shrinker map bit either.  That bit is kept cleared forever.
      
      How does list_lru_del() race with reparenting? It is because reparenting
      replaces children's kmemcg_id to parent's without protecting from
      nlru->lock, so list_lru_del() may see parent's kmemcg_id but actually
      deleting items from child's lru, but dec'ing parent's nr_items, so the
      parent's nr_items may go negative as commit 2788cf0c ("memcg:
      reparent list_lrus and free kmemcg_id on css offline") says.
      
      Since it is impossible that dst->nr_items goes negative and
      src->nr_items goes zero at the same time, so it seems we could set the
      shrinker map bit iff src->nr_items != 0.  We could synchronize
      list_lru_count_one() and reparenting with nlru->lock, but it seems
      checking src->nr_items in reparenting is the simplest and avoids lock
      contention.
      
      Fixes: fae91d6d ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance")
      Suggested-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.19]
      Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8199be00
    • Roman Gushchin's avatar
      mm: memcg/slab: fix obj_cgroup_charge() return value handling · becaba65
      Roman Gushchin authored
      Commit 10befea9 ("mm: memcg/slab: use a single set of kmem_caches
      for all allocations") introduced a regression into the handling of the
      obj_cgroup_charge() return value.  If a non-zero value is returned
      (indicating of exceeding one of memory.max limits), the allocation
      should fail, instead of falling back to non-accounted mode.
      
      To make the code more readable, move memcg_slab_pre_alloc_hook() and
      memcg_slab_post_alloc_hook() calling conditions into bodies of these
      hooks.
      
      Fixes: 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      becaba65
    • Menglong Dong's avatar
      coredump: fix core_pattern parse error · 2bf509d9
      Menglong Dong authored
      'format_corename()' will splite 'core_pattern' on spaces when it is in
      pipe mode, and take helper_argv[0] as the path to usermode executable.
      It works fine in most cases.
      
      However, if there is a space between '|' and '/file/path', such as
      '| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will
      be parsed as '', and users will get a 'Core dump to | disabled'.
      
      It is not friendly to users, as the pattern above was valid previously.
      Fix this by ignoring the spaces between '|' and '/file/path'.
      
      Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template")
      Signed-off-by: default avatarMenglong Dong <dong.menglong@zte.com.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Wise <pabs3@bonedaddy.net>
      Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bf509d9
    • Randy Dunlap's avatar
      zlib: export S390 symbols for zlib modules · 11fb479f
      Randy Dunlap authored
      Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
      by exporting the 2 needed symbols in dfltcc_inflate.c.
      
      Fixes these build errors:
      
        ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
        ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
      
      Fixes: 12619610 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11fb479f
    • Masahiro Yamada's avatar
      kbuild: avoid split lines in .mod files · 7d32358b
      Masahiro Yamada authored
      "xargs echo" is not a safe way to remove line breaks because the input
      may exceed the command line limit and xargs may break it up into
      multiple invocations of echo. This should never happen because
      scripts/gen_autoksyms.sh expects all undefined symbols are placed in
      the second line of .mod files.
      
      One possible way is to replace "xargs echo" with
      "sed ':x;N;$!bx;s/\n/ /g'" or something, but I rewrote the code by
      using awk because it is more readable.
      
      This issue was reported by Sami Tolvanen; in his Clang LTO patch set,
      $(multi-used-m) is no longer an ELF object, but a thin archive that
      contains LLVM bitcode files. llvm-nm prints out symbols for each
      archive member separately, which results a lot of dupications, in some
      places, beyond the system-defined limit.
      
      This problem must be fixed irrespective of LTO, and we must ensure
      zero possibility of having this issue.
      
      Link: https://lkml.org/lkml/2020/12/1/1658Reported-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      7d32358b
    • Michael S. Tsirkin's avatar
      Revert "mei: virtio: virtualization frontend driver" · 264f53b4
      Michael S. Tsirkin authored
      This reverts commit d162219c.
      
      The device uses a VIRTIO device ID out of a not-for-production range.
      Releasing Linux using an ID out of this range will make it conflict with
      development setups. An official request to reserve an ID for an MEI
      device is yet to be submitted to the virtio TC, thus there's no chance
      it will be reserved and fixed in time before the next release.
      
      Once requested it usually takes 2-3 weeks to land in the spec, which
      means the device can be supported with the official ID in the next Linux
      version if contributors act quickly.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: Tomas Winkler <tomas.winkler@intel.com>
      Cc: Alexander Usyskin <alexander.usyskin@intel.com>
      Cc: Wang Yu <yu1.wang@intel.com>
      Cc: Liu Shuo <shuo.a.liu@intel.com>
      Link: https://lore.kernel.org/r/20201205193625.469773-1-mst@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      264f53b4
    • Masami Hiramatsu's avatar
      x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes · 84da009f
      Masami Hiramatsu authored
      Since insn.prefixes.nbytes can be bigger than the size of
      insn.prefixes.bytes[] when a prefix is repeated, the proper
      check must be:
      
        insn.prefixes.bytes[i] != 0 and i < 4
      
      instead of using insn.prefixes.nbytes. Use the new
      for_each_insn_prefix() macro which does it correctly.
      
      Debugged by Kees Cook <keescook@chromium.org>.
      
       [ bp: Massage commit message. ]
      
      Fixes: 25189d08 ("x86/sev-es: Add support for handling IOIO exceptions")
      Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
      84da009f
    • Masami Hiramatsu's avatar
      x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes · 12cb908a
      Masami Hiramatsu authored
      Since insn.prefixes.nbytes can be bigger than the size of
      insn.prefixes.bytes[] when a prefix is repeated, the proper check must
      be
      
        insn.prefixes.bytes[i] != 0 and i < 4
      
      instead of using insn.prefixes.nbytes. Use the new
      for_each_insn_prefix() macro which does it correctly.
      
      Debugged by Kees Cook <keescook@chromium.org>.
      
       [ bp: Massage commit message. ]
      
      Fixes: 32d0b953 ("x86/insn-eval: Add utility functions to get segment selector")
      Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
      12cb908a