1. 12 Apr, 2018 9 commits
    • Vivek Goyal's avatar
      ovl: set d->is_dir and d->opaque for last path element · 102b0d11
      Vivek Goyal authored
      Certain properties in ovl_lookup_data should be set only for the last
      element of the path. IOW, if we are calling ovl_lookup_single() for an
      absolute redirect, then d->is_dir and d->opaque do not make much sense
      for intermediate path elements. Instead set them only if dentry being
      lookup is last path element.
      
      As of now we do not seem to be making use of d->opaque if it is set for
      a path/dentry in lower. But just define the semantics so that future code
      can make use of this assumption.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      102b0d11
    • Vivek Goyal's avatar
      ovl: Do not check for redirect if this is last layer · e9b77f90
      Vivek Goyal authored
      If we are looking in last layer, then there should not be any need to
      process redirect. redirect information is used only for lookup in next
      lower layer and there is no more lower layer to look into. So no need
      to process redirects.
      
      IOW, ignore redirects on lowest layer.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      e9b77f90
    • Amir Goldstein's avatar
      ovl: lookup in inode cache first when decoding lower file handle · 8b58924a
      Amir Goldstein authored
      When decoding a lower file handle, we need to check if lower file was
      copied up and indexed and if it has a whiteout index, we need to check
      if this is an unlinked but open non-dir before returning -ESTALE.
      
      To find out if this is an unlinked but open non-dir we need to lookup
      an overlay inode in inode cache by lower inode and that requires decoding
      the lower file handle before looking in inode cache.
      
      Before this change, if the lower inode turned out to be a directory, we
      may have paid an expensive cost to reconnect that lower directory for
      nothing.
      
      After this change, we start by decoding a disconnected lower dentry and
      using the lower inode for looking up an overlay inode in inode cache.
      If we find overlay inode and dentry in cache, we avoid the index lookup
      overhead. If we don't find an overlay inode and dentry in cache, then we
      only need to decode a connected lower dentry in case the lower dentry is
      a non-indexed directory.
      
      The xfstests group overlay/exportfs tests decoding overlayfs file
      handles after drop_caches with different states of the file at encode
      and decode time. Overall the tests in the group call ovl_lower_fh_to_d()
      89 times to decode a lower file handle.
      
      Before this change, the tests called ovl_get_index_fh() 75 times and
      reconnect_one() 61 times.
      After this change, the tests call ovl_get_index_fh() 70 times and
      reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided
      are cases where a non-upper directory file handle was encoded, then the
      directory removed and then file handle was decoded.
      
      To demonstrate the affect on decoding file handles with hot inode/dentry
      cache, the drop_caches call in the tests was disabled. Without
      drop_caches, there are no reconnect_one() calls at all before or after
      the change. Before the change, there are 75 calls to ovl_get_index_fh(),
      exactly as the case with drop_caches. After the change, there are only
      10 calls to ovl_get_index_fh().
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8b58924a
    • Amir Goldstein's avatar
      ovl: do not try to reconnect a disconnected origin dentry · 8a22efa1
      Amir Goldstein authored
      On lookup of non directory, we try to decode the origin file handle
      stored in upper inode. The origin file handle is supposed to be decoded
      to a disconnected non-dir dentry, which is fine, because we only need
      the lower inode of a copy up origin.
      
      However, if the origin file handle somehow turns out to be a directory
      we pay the expensive cost of reconnecting the directory dentry, only to
      get a mismatch file type and drop the dentry.
      
      Optimize this case by explicitly opting out of reconnecting the dentry.
      Opting-out of reconnect is done by passing a NULL acceptable callback
      to exportfs_decode_fh().
      
      While the case described above is a strange corner case that does not
      really need to be optimized, the API added for this optimization will
      be used by a following patch to optimize a more common case of decoding
      an overlayfs file handle.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8a22efa1
    • Amir Goldstein's avatar
      ovl: disambiguate ovl_encode_fh() · 5b2cccd3
      Amir Goldstein authored
      Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the
      exportfs function ovl_encode_inode_fh() and change the latter to
      ovl_encode_fh() to match the exportfs method name.
      
      Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency.
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5b2cccd3
    • Amir Goldstein's avatar
      ovl: set lower layer st_dev only if setting lower st_ino · 9f99e50d
      Amir Goldstein authored
      For broken hardlinks, we do not return lower st_ino, so we should
      also not return lower pseudo st_dev.
      
      Fixes: a0c5ad30 ("ovl: relax same fs constraint for constant st_ino")
      Cc: <stable@vger.kernel.org> #v4.15
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      9f99e50d
    • Amir Goldstein's avatar
      ovl: fix lookup with middle layer opaque dir and absolute path redirects · 3ec9b3fa
      Amir Goldstein authored
      As of now if we encounter an opaque dir while looking for a dentry, we set
      d->last=true. This means that there is no need to look further in any of
      the lower layers. This works fine as long as there are no redirets or
      relative redircts. But what if there is an absolute redirect on the
      children dentry of opaque directory. We still need to continue to look into
      next lower layer. This patch fixes it.
      
      Here is an example to demonstrate the issue. Say you have following setup.
      
      upper:  /redirect (redirect=/a/b/c)
      lower1: /a/[b]/c       ([b] is opaque) (c has absolute redirect=/a/b/d/)
      lower0: /a/b/d/foo
      
      Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d.
      Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look
      into lower0 because children c has an absolute redirect.
      
      Following is a reproducer.
      
      Watch me make foo disappear:
      
       $ mkdir lower middle upper work work2 merged
       $ mkdir lower/origin
       $ touch lower/origin/foo
       $ mount -t overlay none merged/ \
               -olowerdir=lower,upperdir=middle,workdir=work2
       $ mkdir merged/pure
       $ mv merged/origin merged/pure/redirect
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
       $ mv merged/pure/redirect merged/redirect
      
      Now you see foo inside a twice redirected merged dir:
      
       $ ls merged/redirect
       foo
       $ umount merged
       $ mount -t overlay none merged/ \
               -olowerdir=middle:lower,upperdir=upper,workdir=work
      
      After mount cycle you don't see foo inside the same dir:
      
       $ ls merged/redirect
      
      During middle layer lookup, the opaqueness of middle/pure is left in
      the lookup state and then middle/pure/redirect is wrongly treated as
      opaque.
      
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3ec9b3fa
    • Vivek Goyal's avatar
      ovl: Set d->last properly during lookup · 452061fd
      Vivek Goyal authored
      d->last signifies that this is the last layer we are looking into and there
      is no more. And that means this allows for some optimzation opportunities
      during lookup. For example, in ovl_lookup_single() we don't have to check
      for opaque xattr of a directory is this is the last layer we are looking
      into (d->last = true).
      
      But knowing for sure whether we are looking into last layer can be very
      tricky. If redirects are not enabled, then we can look at poe->numlower and
      figure out if the lookup we are about to is last layer or not. But if
      redircts are enabled then it is possible poe->numlower suggests that we are
      looking in last layer, but there is an absolute redirect present in found
      element and that redirects us to a layer in root and that means lookup will
      continue in lower layers further.
      
      For example, consider following.
      
      /upperdir/pure (opaque=y)
      /upperdir/pure/foo (opaque=y,redirect=/bar)
      /lowerdir/bar
      
      In this case pure is "pure upper". When we look for "foo", that time
      poe->numlower=0. But that alone does not mean that we will not search for a
      merge candidate in /lowerdir. Absolute redirect changes that.
      
      IOW, d->last should not be set just based on poe->numlower if redirects are
      enabled. That can lead to setting d->last while it should not have and that
      means we will not check for opaque xattr while we should have.
      
      So do this.
      
       - If redirects are not enabled, then continue to rely on poe->numlower
         information to determine if it is last layer or not.
      
       - If redirects are enabled, then set d->last = true only if this is the
         last layer in root ovl_entry (roe).
      Suggested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 02b69b28 ("ovl: lookup redirects")
      Cc: <stable@vger.kernel.org> #v4.10
      452061fd
    • Amir Goldstein's avatar
      ovl: set i_ino to the value of st_ino for NFS export · 695b46e7
      Amir Goldstein authored
      Eddie Horng reported that readdir of an overlayfs directory that
      was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN.
      The reason is that while preparing the response for readdirplus, nfsd
      checks inside encode_entryplus_baggage() that a child dentry's inode
      number matches the value of d_ino returns by overlayfs readdir iterator.
      
      Because the overlayfs inodes use arbitrary inode numbers that are not
      correlated with the values of st_ino/d_ino, NFSv3 falls back to not
      encoding d_type. Although this is an allowed behavior, we can fix it for
      the case of all overlayfs layers on the same underlying filesystem.
      
      When NFS export is enabled and d_ino is consistent with st_ino
      (samefs), set the same value also to i_ino in ovl_fill_inode() for all
      overlayfs inodes, nfsd readdirplus sanity checks will pass.
      ovl_fill_inode() may be called from ovl_new_inode(), before real inode
      was created with ino arg 0. In that case, i_ino will be updated to real
      upper inode i_ino on ovl_inode_init() or ovl_inode_update().
      Reported-by: default avatarEddie Horng <eddiehorng.tw@gmail.com>
      Tested-by: default avatarEddie Horng <eddiehorng.tw@gmail.com>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Fixes: 8383f174 ("ovl: wire up NFS export operations")
      Cc: <stable@vger.kernel.org> #v4.16
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      695b46e7
  2. 25 Mar, 2018 10 commits
  3. 24 Mar, 2018 1 commit
  4. 23 Mar, 2018 20 commits
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 99fec39e
      Linus Torvalds authored
      Pull kprobe fixes from Steven Rostedt:
       "The documentation for kprobe events says that symbol offets can take
        both a + and - sign to get to befor and after the symbol address.
      
        But in actuality, the code does not support the minus. This fixes that
        issue, and adds a few more selftests to kprobe events"
      
      * tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        selftests: ftrace: Add a testcase for probepoint
        selftests: ftrace: Add a testcase for string type with kprobe_event
        selftests: ftrace: Add probe event argument syntax testcase
        tracing: probeevent: Fix to support minus offset from symbol
      99fec39e
    • Andy Lutomirski's avatar
      x86/entry/64: Don't use IST entry for #BP stack · d8ba61ba
      Andy Lutomirski authored
      There's nothing IST-worthy about #BP/int3.  We don't allow kprobes
      in the small handful of places in the kernel that run at CPL0 with
      an invalid stack, and 32-bit kernels have used normal interrupt
      gates for #BP forever.
      
      Furthermore, we don't allow kprobes in places that have usergs while
      in kernel mode, so "paranoid" is also unnecessary.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      d8ba61ba
    • Waiman Long's avatar
      x86/efi: Free efi_pgd with free_pages() · 06ace26f
      Waiman Long authored
      The efi_pgd is allocated as PGD_ALLOCATION_ORDER pages and therefore must
      also be freed as PGD_ALLOCATION_ORDER pages with free_pages().
      
      Fixes: d9e9a641 ("x86/mm/pti: Allocate a separate user PGD")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1521746333-19593-1-git-send-email-longman@redhat.com
      06ace26f
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_4.16_5' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips · 86d043d4
      Linus Torvalds authored
      Pull MIPS fixes from James Hogan:
       "Another miscellaneous pile of MIPS fixes for 4.16:
      
         - lantiq: fixes for clocks and Amazon SE (4.14)
      
         - ralink: fix booting on MT7621 (4.5)
      
         - ralink: fix halt (3.9)"
      
      * tag 'mips_fixes_4.16_5' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
        MIPS: ralink: Fix booting on MT7621
        MIPS: ralink: Remove ralink_halt()
        MIPS: lantiq: ase: Enable MFD_SYSCON
        MIPS: lantiq: Enable AHB Bus for USB
        MIPS: lantiq: Fix Danube USB clock
      86d043d4
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.16-rc7' of git://github.com/awilliam/linux-vfio · 095fe49f
      Linus Torvalds authored
      Pull VFIO fix from Alex Williamson:
       "Revert masking INTx where it cannot be enabled - it plays poorly with
        SR-IOV VFs and presumes DisINTx support"
      
      * tag 'vfio-v4.16-rc7' of git://github.com/awilliam/linux-vfio:
        Revert: "vfio-pci: Mask INTx if a device is not capabable of enabling it"
      095fe49f
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-4.16-rc7' of git://git.infradead.org/linux-mtd · a580657a
      Linus Torvalds authored
      Pull MTD fixes from Boris Brezillon:
      
       - Fix several problems in the fsl_ifc NAND controller driver
      
       - Fix misuse of mtd_ooblayout_ecc() in mtdchar.c
      
      * tag 'mtd/fixes-for-4.16-rc7' of git://git.infradead.org/linux-mtd:
        mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0
        mtd: nand: fsl_ifc: Fix eccstat array overflow for IFC ver >= 2.0.0
        mtd: nand: fsl_ifc: Fix nand waitfunc return value
        mtdchar: fix usage of mtd_ooblayout_ecc()
      a580657a
    • Linus Torvalds's avatar
      Merge tag 'staging-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 935c200a
      Linus Torvalds authored
      Pull staging/IIO fixes from Greg KH:
       "Here are a few small staging and IIO fixes for various reported
        issues.
      
        All of them are tiny, the majority being iio driver fixes for small
        issues, and one staging driver fix for a memory corruption issue.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: ncpfs: memory corruption in ncp_read_kernel()
        iio: st_pressure: st_accel: pass correct platform data to init
        Revert "iio: accel: st_accel: remove redundant pointer pdata"
        iio: adc: meson-saradc: unlock on error in meson_sar_adc_lock()
        dt-bindings: iio: adc: sd-modulator: fix io-channel-cells
        iio: adc: stm32-dfsdm: fix multiple channel initialization
        iio: adc: stm32-dfsdm: fix clock source selection
        iio: adc: stm32-dfsdm: fix call to stop channel
        iio: adc: stm32-dfsdm: fix compatible data use
        iio: chemical: ccs811: Corrected firmware boot/application mode transition
      935c200a
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 97235e74
      Linus Torvalds authored
      Pull hyperv fix from Greg KH:
       "This is a single hyperv bugfix for 4.16-rc7.
      
        It resolves an issue with the ring-buffer signaling to resolve
        reported problems.
      
        It's been in linux-next for a while now with no reported issues"
      
      * tag 'char-misc-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Drivers: hv: vmbus: Fix ring buffer signaling
      97235e74
    • Linus Torvalds's avatar
      Merge tag 'media/v4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · cde00d21
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Three fixes:
      
         - dvb: fix a Kconfig typo on a help text
      
         - tegra-cec: reset rx_buf_cnt when start bit detected
      
         - rc: lirc does not use LIRC_CAN_SEND_SCANCODE feature"
      
      * tag 'media/v4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: dvb: fix a Kconfig typo
        media: tegra-cec: reset rx_buf_cnt when start bit detected
        media: rc: lirc does not use LIRC_CAN_SEND_SCANCODE feature
      cde00d21
    • Linus Torvalds's avatar
      Merge tag 'sound-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 8ce72017
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Things look calming down, but people were still busy to plaster over
        small holes:
      
         - Two fixes to harden against races in aloop driver
      
         - A correction of a long-standing bug in USB-audio UAC2 processing
           unit parser
      
         - As usual suspects, HD-audio: a workaround for Coffee Lake
           controller and a few other device-specific fixes
      
        All small and for stable"
      
      * tag 'sound-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: aloop: Fix access to not-yet-ready substream via cable
        ALSA: aloop: Sync stale timer before release
        ALSA: hda/realtek - Fix speaker no sound after system resume
        ALSA: hda/realtek - Fix Dell headset Mic can't record
        ALSA: hda - Force polling mode on CFL for fixing codec communication
        ALSA: usb-audio: Fix parsing descriptor of UAC2 processing unit
        ALSA: hda/realtek - Always immediately update mute LED with pin VREF
      8ce72017
    • Masami Hiramatsu's avatar
      selftests: ftrace: Add a testcase for probepoint · dfa453bc
      Masami Hiramatsu authored
      Add a testcase for probe point definition. This tests
      symbol, address and symbol+offset syntax. The offset
      must be positive and smaller than UINT_MAX.
      
      Link: http://lkml.kernel.org/r/152129043097.31874.14273580606301767394.stgit@devboxSigned-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      dfa453bc
    • Masami Hiramatsu's avatar
      selftests: ftrace: Add a testcase for string type with kprobe_event · 5fbdbed7
      Masami Hiramatsu authored
      Add a testcase for string type with kprobe event.
      This tests good/bad syntax combinations and also
      the traced data is correct in several way.
      
      Link: http://lkml.kernel.org/r/152129038381.31874.9201387794548737554.stgit@devboxSigned-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      5fbdbed7
    • Masami Hiramatsu's avatar
      selftests: ftrace: Add probe event argument syntax testcase · 871bef20
      Masami Hiramatsu authored
      Add a testcase for probe event argument syntax which
      ensures the kprobe_events interface correctly parses
      given event arguments.
      
      Link: http://lkml.kernel.org/r/152129033679.31874.12705519603869152799.stgit@devboxSigned-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      871bef20
    • Masami Hiramatsu's avatar
      tracing: probeevent: Fix to support minus offset from symbol · c5d343b6
      Masami Hiramatsu authored
      In Documentation/trace/kprobetrace.txt, it says
      
       @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
      
      However, the parser doesn't parse minus offset correctly, since
      commit 2fba0c88 ("tracing/kprobes: Fix probe offset to be
      unsigned") drops minus ("-") offset support for kprobe probe
      address usage.
      
      This fixes the traceprobe_split_symbol_offset() to parse minus
      offset again with checking the offset range, and add a minus
      offset check in kprobe probe address usage.
      
      Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Fixes: 2fba0c88 ("tracing/kprobes: Fix probe offset to be unsigned")
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c5d343b6
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · f36b7534
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, thp: do not cause memcg oom for thp
        mm/vmscan: wake up flushers for legacy cgroups too
        Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
        mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
        mm/thp: do not wait for lock_page() in deferred_split_scan()
        mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
        x86/mm: implement free pmd/pte page interfaces
        mm/vmalloc: add interfaces to free unmapped page table
        h8300: remove extraneous __BIG_ENDIAN definition
        hugetlbfs: check for pgoff value overflow
        lockdep: fix fs_reclaim warning
        MAINTAINERS: update Mark Fasheh's e-mail
        mm/mempolicy.c: avoid use uninitialized preferred_node
      f36b7534
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 8401c72c
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "Two regression fixes, two bug fixes for older issues, two fixes for
        new functionality added this cycle that have userspace ABI concerns,
        and a small cleanup. These have appeared in a linux-next release and
        have a build success report from the 0day robot.
      
         * The 4.16 rework of altmap handling led to some configurations
           leaking page table allocations due to freeing from the altmap
           reservation rather than the page allocator.
      
           The impact without the fix is leaked memory and a WARN() message
           when tearing down libnvdimm namespaces. The rework also missed a
           place where error handling code needed to be removed that can lead
           to a crash if devm_memremap_pages() fails.
      
         * acpi_map_pxm_to_node() had a latent bug whereby it could
           misidentify the closest online node to a given proximity domain.
      
         * Block integrity handling was reworked several kernels back to allow
           calling add_disk() after setting up the integrity profile.
      
           The nd_btt and nd_blk drivers are just now catching up to fix
           automatic partition detection at driver load time.
      
         * The new peristence_domain attribute, a platform indicator of
           whether cpu caches are powerfail protected for example, is meant to
           be a single value enum and not a set of flags.
      
           This oversight was caught while reviewing new userspace code in
           libndctl to communicate the attribute.
      
           Fix this new enabling up so that we are not stuck with an unwanted
           userspace ABI"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, nfit: fix persistence domain reporting
        libnvdimm, region: hide persistence_domain when unknown
        acpi, numa: fix pxm to online numa node associations
        x86, memremap: fix altmap accounting at free
        libnvdimm: remove redundant assignment to pointer 'dev'
        libnvdimm, {btt, blk}: do integrity setup before add_disk()
        kernel/memremap: Remove stale devres_free() call
      8401c72c
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux · 9ec7ccc8
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes all over the place (core, i915, amdgpu, imx, sun4i,
        ast, tegra, vmwgfx), nothing too serious or worrying at this stage.
      
         - one uapi fix to stop multi-planar images with getfb
      
         - Sun4i error path and clock fixes
      
         - udl driver mmap offset fix
      
         - i915 DP MST and GPU reset fixes
      
         - vmwgfx mutex and black screen fixes
      
         - imx array underflow fix and vblank fix
      
         - amdgpu: display fixes
      
         - exynos devicetree fix
      
         - ast mode fix"
      
      * tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux: (29 commits)
        drm/ast: Fixed 1280x800 Display Issue
        drm: udl: Properly check framebuffer mmap offsets
        drm/i915: Specify which engines to reset following semaphore/event lockups
        drm/vmwgfx: Fix a destoy-while-held mutex problem.
        drm/vmwgfx: Fix black screen and device errors when running without fbdev
        drm: Reject getfb for multi-plane framebuffers
        drm/amd/display: Add one to EDID's audio channel count when passing to DC
        drm/amd/display: We shouldn't set format_default on plane as atomic driver
        drm/amd/display: Fix FMT truncation programming
        drm/amd/display: Allow truncation to 10 bits
        drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
        drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
        drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
        drm/amd/display: fix dereferencing possible ERR_PTR()
        drm/amd/display: Refine disable VGA
        drm/tegra: Shutdown on driver unbind
        drm/tegra: dsi: Don't disable regulator on ->exit()
        drm/tegra: dc: Detach IOMMU group from domain only once
        dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
        drm/imx: move arming of the vblank event to atomic_flush
        ...
      9ec7ccc8
    • David Rientjes's avatar
      mm, thp: do not cause memcg oom for thp · 9d3c3354
      David Rientjes authored
      Commit 25160354 ("mm, thp: remove __GFP_NORETRY from khugepaged and
      madvised allocations") changed the page allocator to no longer detect
      thp allocations based on __GFP_NORETRY.
      
      It did not, however, modify the mem cgroup try_charge() path to avoid
      oom kill for either khugepaged collapsing or thp faulting.  It is never
      expected to oom kill a process to allocate a hugepage for thp; reclaim
      is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
      (and charging) should fallback instead of oom killing processes.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
      Fixes: 25160354 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d3c3354
    • Andrey Ryabinin's avatar
      mm/vmscan: wake up flushers for legacy cgroups too · 1c610d5f
      Andrey Ryabinin authored
      Commit 726d061f ("mm: vmscan: kick flushers when we encounter dirty
      pages on the LRU") added flusher invocation to shrink_inactive_list()
      when many dirty pages on the LRU are encountered.
      
      However, shrink_inactive_list() doesn't wake up flushers for legacy
      cgroup reclaim, so the next commit bbef9384 ("mm: vmscan: remove old
      flusher wakeup from direct reclaim path") removed the only source of
      flusher's wake up in legacy mem cgroup reclaim path.
      
      This leads to premature OOM if there is too many dirty pages in cgroup:
          # mkdir /sys/fs/cgroup/memory/test
          # echo $$ > /sys/fs/cgroup/memory/test/tasks
          # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
          # dd if=/dev/zero of=tmp_file bs=1M count=100
          Killed
      
          dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
      
          Call Trace:
           dump_stack+0x46/0x65
           dump_header+0x6b/0x2ac
           oom_kill_process+0x21c/0x4a0
           out_of_memory+0x2a5/0x4b0
           mem_cgroup_out_of_memory+0x3b/0x60
           mem_cgroup_oom_synchronize+0x2ed/0x330
           pagefault_out_of_memory+0x24/0x54
           __do_page_fault+0x521/0x540
           page_fault+0x45/0x50
      
          Task in /test killed as a result of limit of /test
          memory: usage 51200kB, limit 51200kB, failcnt 73
          memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
          kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
          Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
                  mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
      	    active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
          Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
          Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
          oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      
      Wake up flushers in legacy cgroup reclaim too.
      
      Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
      Fixes: bbef9384 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Tested-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c610d5f
    • Daniel Vacek's avatar
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek authored
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: default avatarDaniel Vacek <neelx@redhat.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf