1. 12 Jul, 2019 3 commits
    • Yafang Shao's avatar
      mm/memcontrol: fix wrong statistics in memory.stat · dd923990
      Yafang Shao authored
      When we calculate total statistics for memcg1_stats and memcg1_events,
      we use the the index 'i' in the for loop as the events index.  Actually
      we should use memcg1_stats[i] and memcg1_events[i] as the events index.
      
      Link: http://lkml.kernel.org/r/1562116978-19539-1-git-send-email-laoar.shao@gmail.com
      Fixes: 42a30035 ("mm: memcontrol: fix recursive statistics correctness & scalabilty").
      Signed-off-by: Yafang Shao <laoar.shao@gmail.com
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yafang Shao <shaoyafang@didiglobal.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd923990
    • Aneesh Kumar K.V's avatar
      mm/nvdimm: add is_ioremap_addr and use that to check ioremap address · 9bd3bb67
      Aneesh Kumar K.V authored
      Architectures like powerpc use different address range to map ioremap
      and vmalloc range.  The memunmap() check used by the nvdimm layer was
      wrongly using is_vmalloc_addr() to check for ioremap range which fails
      for ppc64.  This result in ppc64 not freeing the ioremap mapping.  The
      side effect of this is an unbind failure during module unload with
      papr_scm nvdimm driver
      
      Link: http://lkml.kernel.org/r/20190701134038.14165-1-aneesh.kumar@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9bd3bb67
    • Kuo-Hsin Yang's avatar
      mm: vmscan: scan anonymous pages on file refaults · 2c012a4a
      Kuo-Hsin Yang authored
      When file refaults are detected and there are many inactive file pages,
      the system never reclaim anonymous pages, the file pages are dropped
      aggressively when there are still a lot of cold anonymous pages and
      system thrashes.  This issue impacts the performance of applications
      with large executable, e.g.  chrome.
      
      With this patch, when file refault is detected, inactive_list_is_low()
      always returns true for file pages in get_scan_count() to enable
      scanning anonymous pages.
      
      The problem can be reproduced by the following test program.
      
      ---8<---
      void fallocate_file(const char *filename, off_t size)
      {
      	struct stat st;
      	int fd;
      
      	if (!stat(filename, &st) && st.st_size >= size)
      		return;
      
      	fd = open(filename, O_WRONLY | O_CREAT, 0600);
      	if (fd < 0) {
      		perror("create file");
      		exit(1);
      	}
      	if (posix_fallocate(fd, 0, size)) {
      		perror("fallocate");
      		exit(1);
      	}
      	close(fd);
      }
      
      long *alloc_anon(long size)
      {
      	long *start = malloc(size);
      	memset(start, 1, size);
      	return start;
      }
      
      long access_file(const char *filename, long size, long rounds)
      {
      	int fd, i;
      	volatile char *start1, *end1, *start2;
      	const int page_size = getpagesize();
      	long sum = 0;
      
      	fd = open(filename, O_RDONLY);
      	if (fd == -1) {
      		perror("open");
      		exit(1);
      	}
      
      	/*
      	 * Some applications, e.g. chrome, use a lot of executable file
      	 * pages, map some of the pages with PROT_EXEC flag to simulate
      	 * the behavior.
      	 */
      	start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED,
      		      fd, 0);
      	if (start1 == MAP_FAILED) {
      		perror("mmap");
      		exit(1);
      	}
      	end1 = start1 + size / 2;
      
      	start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2);
      	if (start2 == MAP_FAILED) {
      		perror("mmap");
      		exit(1);
      	}
      
      	for (i = 0; i < rounds; ++i) {
      		struct timeval before, after;
      		volatile char *ptr1 = start1, *ptr2 = start2;
      		gettimeofday(&before, NULL);
      		for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size)
      			sum += *ptr1 + *ptr2;
      		gettimeofday(&after, NULL);
      		printf("File access time, round %d: %f (sec)
      ", i,
      		       (after.tv_sec - before.tv_sec) +
      		       (after.tv_usec - before.tv_usec) / 1000000.0);
      	}
      	return sum;
      }
      
      int main(int argc, char *argv[])
      {
      	const long MB = 1024 * 1024;
      	long anon_mb, file_mb, file_rounds;
      	const char filename[] = "large";
      	long *ret1;
      	long ret2;
      
      	if (argc != 4) {
      		printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS
      ");
      		exit(0);
      	}
      	anon_mb = atoi(argv[1]);
      	file_mb = atoi(argv[2]);
      	file_rounds = atoi(argv[3]);
      
      	fallocate_file(filename, file_mb * MB);
      	printf("Allocate %ld MB anonymous pages
      ", anon_mb);
      	ret1 = alloc_anon(anon_mb * MB);
      	printf("Access %ld MB file pages
      ", file_mb);
      	ret2 = access_file(filename, file_mb * MB, file_rounds);
      	printf("Print result to prevent optimization: %ld
      ",
      	       *ret1 + ret2);
      	return 0;
      }
      ---8<---
      
      Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program
      fills ram with 2048 MB memory, access a 200 MB file for 10 times.  Without
      this patch, the file cache is dropped aggresively and every access to the
      file is from disk.
      
        $ ./thrash 2048 200 10
        Allocate 2048 MB anonymous pages
        Access 200 MB file pages
        File access time, round 0: 2.489316 (sec)
        File access time, round 1: 2.581277 (sec)
        File access time, round 2: 2.487624 (sec)
        File access time, round 3: 2.449100 (sec)
        File access time, round 4: 2.420423 (sec)
        File access time, round 5: 2.343411 (sec)
        File access time, round 6: 2.454833 (sec)
        File access time, round 7: 2.483398 (sec)
        File access time, round 8: 2.572701 (sec)
        File access time, round 9: 2.493014 (sec)
      
      With this patch, these file pages can be cached.
      
        $ ./thrash 2048 200 10
        Allocate 2048 MB anonymous pages
        Access 200 MB file pages
        File access time, round 0: 2.475189 (sec)
        File access time, round 1: 2.440777 (sec)
        File access time, round 2: 2.411671 (sec)
        File access time, round 3: 1.955267 (sec)
        File access time, round 4: 0.029924 (sec)
        File access time, round 5: 0.000808 (sec)
        File access time, round 6: 0.000771 (sec)
        File access time, round 7: 0.000746 (sec)
        File access time, round 8: 0.000738 (sec)
        File access time, round 9: 0.000747 (sec)
      
      Checked the swap out stats during the test [1], 19006 pages swapped out
      with this patch, 3418 pages swapped out without this patch. There are
      more swap out, but I think it's within reasonable range when file backed
      data set doesn't fit into the memory.
      
      $ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS
      PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644,
      inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB)
      pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages
      Access 2100 MB regular file pages File access time, round 0: 12.165,
      (sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896,
      inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec)
      active_anon: 1430576, inactive_anon: 477144, active_file: 25440,
      inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec)
      active_anon: 1427436, inactive_anon: 476060, active_file: 21112,
      inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec)
      active_anon: 1420444, inactive_anon: 473632, active_file: 23216,
      inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec)
      active_anon: 1413964, inactive_anon: 471460, active_file: 31728,
      inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366
      (+ 11309120)
      
      With 4 processes accessing non-overlapping parts of a large file, 30316
      pages swapped out with this patch, 5152 pages swapped out without this
      patch.  The swapout number is small comparing to pgpgin.
      
      [1]: https://github.com/vovo/testing/blob/master/mem_thrash.c
      
      Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com
      Fixes: e9868505 ("mm,vmscan: only evict file pages when we have plenty")
      Fixes: 7c5bd705 ("mm: memcg: only evict file pages when we have plenty")
      Signed-off-by: default avatarKuo-Hsin Yang <vovoy@chromium.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c012a4a
  2. 11 Jul, 2019 22 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 753c8d9b
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A collection of assorted fixes:
      
         - Fix for the pinned cr0/4 fallout which escaped all testing efforts
           because the kvm-intel module was never loaded when the kernel was
           compiled with CONFIG_PARAVIRT=n. The cr0/4 accessors are moved out
           of line and static key is now solely used in the core code and
           therefore can stay in the RO after init section. So the kvm-intel
           and other modules do not longer reference the (read only) static
           key which the module loader tried to update.
      
         - Prevent an infinite loop in arch_stack_walk_user() by breaking out
           of the loop once the return address is detected to be 0.
      
         - Prevent the int3_emulate_call() selftest from corrupting the stack
           when KASAN is enabled. KASASN clobbers more registers than covered
           by the emulated call implementation. Convert the int3_magic()
           selftest to a ASM function so the compiler cannot KASANify it.
      
         - Unbreak the build with old GCC versions and with the Gold linker by
           reverting the 'Move of _etext to the actual end of .text'. In both
           cases the build fails with 'Invalid absolute R_X86_64_32S
           relocation: _etext'
      
         - Initialize the context lock for init_mm, which was never an issue
           until the alternatives code started to use a temporary mm for
           patching.
      
         - Fix a build warning vs. the LOWMEM_PAGES constant where clang
           complains rightfully about a signed integer overflow in the shift
           operation by converting the operand to an ULL.
      
         - Adjust the misnamed ENDPROC() of common_spurious in the 32bit entry
           code"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/stacktrace: Prevent infinite loop in arch_stack_walk_user()
        x86/asm: Move native_write_cr0/4() out of line
        x86/pgtable/32: Fix LOWMEM_PAGES constant
        x86/alternatives: Fix int3_emulate_call() selftest stack corruption
        x86/entry/32: Fix ENDPROC of common_spurious
        Revert "x86/build: Move _etext to actual end of .text"
        x86/ldt: Initialize the context lock for init_mm
      753c8d9b
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d7fe42a6
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "Two small fixes from the timer departement:
      
         - Prevent the compiler from converting the nanoseconds adjustment
           loop in the VDSO update function to a division (__udivdi3) by using
           the __iter_div_u64_rem() inline function which exists to prevent
           exactly that problem.
      
         - Fix the wrong argument order of the GENMASK macro in the NPCM timer
           driver"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping/vsyscall: Use __iter_div_u64_rem()
        clocksource/drivers/npcm: Fix misuse of GENMASK macro
      d7fe42a6
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02150fab
      Linus Torvalds authored
      Pull stacktrace fix from Thomas Gleixner:
       "Fix yet another instance of kernel thread check which ignores that
        kernel threads can call use_mm()"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        stacktrace: Use PF_KTHREAD to check for kernel threads
      02150fab
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a83f575
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two small fixes for interrupt chip drivers:
      
         - Prevent UAF in the new RZA1 chip driver
      
         - Fix the wrong argument order of the GENMASK macro in the GIC code"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3-its: Fix misuse of GENMASK macro
        irqchip/renesas-rza1: Prevent use-after-free in rza1_irqc_probe()
      3a83f575
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · a131c2bf
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Revert a recent ACPICA commit causing systems to hang at boot time"
      
      * tag 'acpi-5.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPICA: Update table load object initialization"
      a131c2bf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · 237f83df
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "Some highlights from this development cycle:
      
         1) Big refactoring of ipv6 route and neigh handling to support
            nexthop objects configurable as units from userspace. From David
            Ahern.
      
         2) Convert explored_states in BPF verifier into a hash table,
            significantly decreased state held for programs with bpf2bpf
            calls, from Alexei Starovoitov.
      
         3) Implement bpf_send_signal() helper, from Yonghong Song.
      
         4) Various classifier enhancements to mvpp2 driver, from Maxime
            Chevallier.
      
         5) Add aRFS support to hns3 driver, from Jian Shen.
      
         6) Fix use after free in inet frags by allocating fqdirs dynamically
            and reworking how rhashtable dismantle occurs, from Eric Dumazet.
      
         7) Add act_ctinfo packet classifier action, from Kevin
            Darbyshire-Bryant.
      
         8) Add TFO key backup infrastructure, from Jason Baron.
      
         9) Remove several old and unused ISDN drivers, from Arnd Bergmann.
      
        10) Add devlink notifications for flash update status to mlxsw driver,
            from Jiri Pirko.
      
        11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.
      
        12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.
      
        13) Various enhancements to ipv6 flow label handling, from Eric
            Dumazet and Willem de Bruijn.
      
        14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
            der Merwe, and others.
      
        15) Various improvements to axienet driver including converting it to
            phylink, from Robert Hancock.
      
        16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.
      
        17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
            Radulescu.
      
        18) Add devlink health reporting to mlx5, from Moshe Shemesh.
      
        19) Convert stmmac over to phylink, from Jose Abreu.
      
        20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
            Shalom Toledo.
      
        21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.
      
        22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.
      
        23) Track spill/fill of constants in BPF verifier, from Alexei
            Starovoitov.
      
        24) Support bounded loops in BPF, from Alexei Starovoitov.
      
        25) Various page_pool API fixes and improvements, from Jesper Dangaard
            Brouer.
      
        26) Just like ipv4, support ref-countless ipv6 route handling. From
            Wei Wang.
      
        27) Support VLAN offloading in aquantia driver, from Igor Russkikh.
      
        28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.
      
        29) Add flower GRE encap/decap support to nfp driver, from Pieter
            Jansen van Vuuren.
      
        30) Protect against stack overflow when using act_mirred, from John
            Hurley.
      
        31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.
      
        32) Use page_pool API in netsec driver, Ilias Apalodimas.
      
        33) Add Google gve network driver, from Catherine Sullivan.
      
        34) More indirect call avoidance, from Paolo Abeni.
      
        35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.
      
        36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.
      
        37) Add MPLS manipulation actions to TC, from John Hurley.
      
        38) Add sending a packet to connection tracking from TC actions, and
            then allow flower classifier matching on conntrack state. From
            Paul Blakey.
      
        39) Netfilter hw offload support, from Pablo Neira Ayuso"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
        net/mlx5e: Return in default case statement in tx_post_resync_params
        mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
        net: dsa: add support for BRIDGE_MROUTER attribute
        pkt_sched: Include const.h
        net: netsec: remove static declaration for netsec_set_tx_de()
        net: netsec: remove superfluous if statement
        netfilter: nf_tables: add hardware offload support
        net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
        net: flow_offload: add flow_block_cb_is_busy() and use it
        net: sched: remove tcf block API
        drivers: net: use flow block API
        net: sched: use flow block API
        net: flow_offload: add flow_block_cb_{priv, incref, decref}()
        net: flow_offload: add list handling functions
        net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
        net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
        net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
        net: flow_offload: add flow_block_cb_setup_simple()
        net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
        net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
        ...
      237f83df
    • Linus Torvalds's avatar
      Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 8f6ccf61
      Linus Torvalds authored
      Pull clone3 system call from Christian Brauner:
       "This adds the clone3 syscall which is an extensible successor to clone
        after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
        window for clone(). It cleanly supports all of the flags from clone()
        and thus all legacy workloads.
      
        There are few user visible differences between clone3 and clone.
        First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
        this flag. Second, the CSIGNAL flag is deprecated and will cause
        EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
        argument in struct clone_args thus freeing up even more flags. And
        third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
        clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
        argument.
      
        The clone3 uapi is designed to be easy to handle on 32- and 64 bit:
      
          /* uapi */
          struct clone_args {
                  __aligned_u64 flags;
                  __aligned_u64 pidfd;
                  __aligned_u64 child_tid;
                  __aligned_u64 parent_tid;
                  __aligned_u64 exit_signal;
                  __aligned_u64 stack;
                  __aligned_u64 stack_size;
                  __aligned_u64 tls;
          };
      
        and a separate kernel struct is used that uses proper kernel typing:
      
          /* kernel internal */
          struct kernel_clone_args {
                  u64 flags;
                  int __user *pidfd;
                  int __user *child_tid;
                  int __user *parent_tid;
                  int exit_signal;
                  unsigned long stack;
                  unsigned long stack_size;
                  unsigned long tls;
          };
      
        The system call comes with a size argument which enables the kernel to
        detect what version of clone_args userspace is passing in. clone3
        validates that any additional bytes a given kernel does not know about
        are set to zero and that the size never exceeds a page.
      
        A nice feature is that this patchset allowed us to cleanup and
        simplify various core kernel codepaths in kernel/fork.c by making the
        internal _do_fork() function take struct kernel_clone_args even for
        legacy clone().
      
        This patch also unblocks the time namespace patchset which wants to
        introduce a new CLONE_TIMENS flag.
      
        Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
        xtensa. These were the architectures that did not require special
        massaging.
      
        Other architectures treat fork-like system calls individually and
        after some back and forth neither Arnd nor I felt confident that we
        dared to add clone3 unconditionally to all architectures. We agreed to
        leave this up to individual architecture maintainers. This is why
        there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
        which any architecture can set once it has implemented support for
        clone3. The patch also adds a cond_syscall(clone3) for architectures
        such as nios2 or h8300 that generate their syscall table by simply
        including asm-generic/unistd.h. The hope is to get rid of
        __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"
      
      * tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        arch: handle arches who do not yet define clone3
        arch: wire-up clone3() syscall
        fork: add clone3
      8f6ccf61
    • Eiichi Tsukata's avatar
      x86/stacktrace: Prevent infinite loop in arch_stack_walk_user() · cbf5b73d
      Eiichi Tsukata authored
      arch_stack_walk_user() checks `if (fp == frame.next_fp)` to prevent a
      infinite loop by self reference but it's not enogh for circular reference.
      
      Once a lack of return address is found, there is no point to continue the
      loop, so break out.
      
      Fixes: 02b67518 ("tracing: add support for userspace stacktraces in tracing/iter_ctrl")
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/20190711023501.963-1-devel@etsukata.com
      cbf5b73d
    • Linus Torvalds's avatar
      Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 5450e8a3
      Linus Torvalds authored
      Pull pidfd updates from Christian Brauner:
       "This adds two main features.
      
         - First, it adds polling support for pidfds. This allows process
           managers to know when a (non-parent) process dies in a race-free
           way.
      
           The notification mechanism used follows the same logic that is
           currently used when the parent of a task is notified of a child's
           death. With this patchset it is possible to put pidfds in an
           {e}poll loop and get reliable notifications for process (i.e.
           thread-group) exit.
      
         - The second feature compliments the first one by making it possible
           to retrieve pollable pidfds for processes that were not created
           using CLONE_PIDFD.
      
           A lot of processes get created with traditional PID-based calls
           such as fork() or clone() (without CLONE_PIDFD). For these
           processes a caller can currently not create a pollable pidfd. This
           is a problem for Android's low memory killer (LMK) and service
           managers such as systemd.
      
        Both patchsets are accompanied by selftests.
      
        It's perhaps worth noting that the work done so far and the work done
        in this branch for pidfd_open() and polling support do already see
        some adoption:
      
         - Android is in the process of backporting this work to all their LTS
           kernels [1]
      
         - Service managers make use of pidfd_send_signal but will need to
           wait until we enable waiting on pidfds for full adoption.
      
         - And projects I maintain make use of both pidfd_send_signal and
           CLONE_PIDFD [2] and will use polling support and pidfd_open() too"
      
      [1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
          https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
          https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22
      
      [2] https://github.com/lxc/lxc/blob/aab6e3eb73c343231cdde775db938994fc6f2803/src/lxc/start.c#L1753
      
      * tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        tests: add pidfd_open() tests
        arch: wire-up pidfd_open()
        pid: add pidfd_open()
        pidfd: add polling selftests
        pidfd: add polling support
      5450e8a3
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v5.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 29cd581b
      Linus Torvalds authored
      Pull m68k fix from Geert Uytterhoeven:
       "Don't select ARCH_HAS_DMA_PREP_COHERENT for nommu or coldfire.
      
        This is a fix for an issue detected in next, to avoid introducing
        build failures when merging Christoph's dma-mapping tree later"
      
      * tag 'm68k-for-v5.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: Don't select ARCH_HAS_DMA_PREP_COHERENT for nommu or coldfire
      29cd581b
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 398364a3
      Linus Torvalds authored
      Pull m68nommu updates from Greg Ungerer:
       "A series of cleanups for the FLAT format binary loader, binfmt_flat,
        from Christoph.
      
        The end goal is to support no-MMU on RISC-V, and the last patch
        enables that"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        riscv: add binfmt_flat support
        binfmt_flat: don't offset the data start
        binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
        binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
        binfmt_flat: provide an asm-generic/flat.h
        binfmt_flat: make support for old format binaries optional
        binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
        binfmt_flat: add endianess annotations
        binfmt_flat: use fixed size type for the on-disk format
        binfmt_flat: consolidate two version of flat_v2_reloc_t
        binfmt_flat: remove the unused OLD_FLAT_FLAG_RAM definition
        binfmt_flat: remove the uapi <linux/flat.h> header
        binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
        binfmt_flat: remove flat_old_ram_flag
        binfmt_flat: provide a default version of flat_get_relocate_addr
        binfmt_flat: remove flat_set_persistent
        binfmt_flat: remove flat_reloc_valid
      398364a3
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux · d2b6b4c8
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "Highlights:
      
         - Add a new /proc/fs/nfsd/clients/ directory which exposes some
           long-requested information about NFSv4 clients (like open files)
           and allows forced revocation of client state.
      
         - Replace the global duplicate reply cache by a cache per network
           namespace; previously, a request in one network namespace could
           incorrectly match an entry from another, though we haven't seen
           this in production. This is the last remaining container bug that
           I'm aware of; at this point you should be able to run separate
           nfsd's in each network namespace, each with their own set of
           exports, and everything should work.
      
         - Cleanup and modify lock code to show the pid of lockd as the owner
           of NLM locks. This is the correct version of the bugfix originally
           attempted in b8eee0e9 ("lockd: Show pid of lockd for remote
           locks")"
      
      * tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
        nfsd: Make __get_nfsdfs_client() static
        nfsd: Make two functions static
        nfsd: Fix misuse of strlcpy
        sunrpc/cache: remove the exporting of cache_seq_next
        nfsd: decode implementation id
        nfsd: create xdr_netobj_dup helper
        nfsd: allow forced expiration of NFSv4 clients
        nfsd: create get_nfsdfs_clp helper
        nfsd4: show layout stateids
        nfsd: show lock and deleg stateids
        nfsd4: add file to display list of client's opens
        nfsd: add more information to client info file
        nfsd: escape high characters in binary data
        nfsd: copy client's address including port number to cl_addr
        nfsd4: add a client info file
        nfsd: make client/ directory names small ints
        nfsd: add nfsd/clients directory
        nfsd4: use reference count to free client
        nfsd: rename cl_refcount
        nfsd: persist nfsd filesystem across mounts
        ...
      d2b6b4c8
    • Linus Torvalds's avatar
      Merge tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 0248a8be
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
       "Some relatively minor changes for gfs2:
      
         - An initial batch of obvious cleanups and fixes from Bob's recovery
           patch queue.
      
         - Two iomap conversion patches and some cleanups from Christoph
           Hellwig.
      
         - A cosmetic cleanup from Kefeng Wang (Huawei).
      
         - Another minor fix and cleanup by me"
      
      * tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Remove unused gfs2_iomap_alloc argument
        gfs2: don't use buffer_heads in gfs2_allocate_page_backing
        gfs2: use iomap_bmap instead of generic_block_bmap
        gfs2: mark stuffed_readpage static
        gfs2: merge gfs2_writepage_common into gfs2_writepage
        gfs2: merge gfs2_writeback_aops and gfs2_ordered_aops
        gfs2: remove the unused gfs2_stuffed_write_end function
        gfs2: use page_offset in gfs2_page_mkwrite
        gfs2: replace more printk with calls to fs_info and friends
        gfs2: dump fsid when dumping glock problems
        gfs2: simplify gfs2_freeze by removing case
        gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
        gfs2: Warn when a journal replay overwrites a rgrp with buffers
        gfs2: log which portion of the journal is replayed
        gfs2: eliminate tr_num_revoke_rm
        gfs2: kthread and remount improvements
        gfs2: Use IS_ERR_OR_NULL
        gfs2: Clean up freeing struct gfs2_sbd
      0248a8be
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 2e756758
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "Many bug fixes and cleanups, and an optimization for case-insensitive
        lookups"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix coverity warning on error path of filename setup
        ext4: replace ktype default_attrs with default_groups
        ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
        ext4: refactor initialize_dirent_tail()
        ext4: rename "dirent_csum" functions to use "dirblock"
        ext4: allow directory holes
        jbd2: drop declaration of journal_sync_buffer()
        ext4: use jbd2_inode dirty range scoping
        jbd2: introduce jbd2_inode dirty range scoping
        mm: add filemap_fdatawait_range_keep_errors()
        ext4: remove redundant assignment to node
        ext4: optimize case-insensitive lookups
        ext4: make __ext4_get_inode_loc plug
        ext4: clean up kerneldoc warnigns when building with W=1
        ext4: only set project inherit bit for directory
        ext4: enforce the immutable flag on open files
        ext4: don't allow any modifications to an immutable file
        jbd2: fix typo in comment of journal_submit_inode_data_buffers
        jbd2: fix some print format mistakes
        ext4: gracefully handle ext4_break_layouts() failure during truncate
      2e756758
    • Linus Torvalds's avatar
      Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 8dda9957
      Linus Torvalds authored
      Pull afs updates from David Howells:
       "A set of minor changes for AFS:
      
         - Remove an unnecessary check in afs_unlink()
      
         - Add a tracepoint for tracking callback management
      
         - Add a tracepoint for afs_server object usage
      
         - Use struct_size()
      
         - Add mappings for AFS UAE abort codes to Linux error codes, using
           symbolic names rather than hex numbers in the .c file"
      
      * tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Add support for the UAE error table
        fs/afs: use struct_size() in kzalloc()
        afs: Trace afs_server usage
        afs: Add some callback management tracepoints
        afs: afs_unlink() doesn't need to check dentry->d_inode
      8dda9957
    • Linus Torvalds's avatar
      Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt · 25cd6f35
      Linus Torvalds authored
      Pull fscrypt updates from Eric Biggers:
      
       - Preparations for supporting encryption on ext4 filesystems where the
         filesystem block size is smaller than PAGE_SIZE.
      
       - Don't allow setting encryption policies on dead directories.
      
       - Various cleanups.
      
      * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
        fscrypt: document testing with xfstests
        fscrypt: remove selection of CONFIG_CRYPTO_SHA256
        fscrypt: remove unnecessary includes of ratelimit.h
        fscrypt: don't set policy for a dead directory
        ext4: encrypt only up to last block in ext4_bio_write_page()
        ext4: decrypt only the needed block in __ext4_block_zero_page_range()
        ext4: decrypt only the needed blocks in ext4_block_write_begin()
        ext4: clear BH_Uptodate flag on decryption error
        fscrypt: decrypt only the needed blocks in __fscrypt_decrypt_bio()
        fscrypt: support decrypting multiple filesystem blocks per page
        fscrypt: introduce fscrypt_decrypt_block_inplace()
        fscrypt: handle blocksize < PAGE_SIZE in fscrypt_zeroout_range()
        fscrypt: support encrypting multiple filesystem blocks per page
        fscrypt: introduce fscrypt_encrypt_block_inplace()
        fscrypt: clean up some BUG_ON()s in block encryption/decryption
        fscrypt: rename fscrypt_do_page_crypto() to fscrypt_crypt_block()
        fscrypt: remove the "write" part of struct fscrypt_ctx
        fscrypt: simplify bounce page handling
      25cd6f35
    • Linus Torvalds's avatar
      Merge tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 40f06c79
      Linus Torvalds authored
      Pull copy_file_range updates from Darrick Wong:
       "This fixes numerous parameter checking problems and inconsistent
        behaviors in the new(ish) copy_file_range system call.
      
        Now the system call will actually check its range parameters
        correctly; refuse to copy into files for which the caller does not
        have sufficient privileges; update mtime and strip setuid like file
        writes are supposed to do; and allows copying up to the EOF of the
        source file instead of failing the call like we used to.
      
        Summary:
      
         - Create a generic copy_file_range handler and make individual
           filesystems responsible for calling it (i.e. no more assuming that
           do_splice_direct will work or is appropriate)
      
         - Refactor copy_file_range and remap_range parameter checking where
           they are the same
      
         - Install missing copy_file_range parameter checking(!)
      
         - Remove suid/sgid and update mtime like any other file write
      
         - Change the behavior so that a copy range crossing the source file's
           eof will result in a short copy to the source file's eof instead of
           EINVAL
      
         - Permit filesystems to decide if they want to handle
           cross-superblock copy_file_range in their local handlers"
      
      * tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fuse: copy_file_range needs to strip setuid bits and update timestamps
        vfs: allow copy_file_range to copy across devices
        xfs: use file_modified() helper
        vfs: introduce file_modified() helper
        vfs: add missing checks to copy_file_range
        vfs: remove redundant checks from generic_remap_checks()
        vfs: introduce generic_file_rw_checks()
        vfs: no fallback for ->copy_file_range
        vfs: introduce generic_copy_file_range()
      40f06c79
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · a47f5c56
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "There are a few fixes for gfs2 but otherwise it's pretty quiet so far.
      
         - Only mark inode dirty at the end of writing to a file (instead of
           once for every page written).
      
         - Fix for an accounting error in the page_done callback"
      
      * tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: fix page_done callback for short writes
        fs: fold __generic_write_end back into generic_write_end
        iomap: don't mark the inode dirty in iomap_write_end
      a47f5c56
    • Linus Torvalds's avatar
      Merge tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 682f7c5c
      Linus Torvalds authored
      Pull ext2, udf and quota updates from Jan Kara:
      
       - some ext2 fixes and cleanups
      
       - a fix of udf bug when extending files
      
       - a fix of quota Q_XGETQSTAT[V] handling
      
      * tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
        ext2: Use kmemdup rather than duplicating its implementation
        quota: honor quota type in Q_XGETQSTAT[V] calls
        ext2: Always brelse bh on failure in ext2_iget()
        ext2: add missing brelse() in ext2_iget()
        ext2: Fix a typo in ext2_getattr argument
        ext2: fix a typo in comment
        ext2: add missing brelse() in ext2_new_inode()
        ext2: optimize ext2_xattr_get()
        ext2: introduce new helper for xattr entry comparison
        ext2: merge xattr next entry check to ext2_xattr_entry_valid()
        ext2: code cleanup for ext2_preread_inode()
        ext2: code cleanup by using test_opt() and clear_opt()
        doc: ext2: update description of quota options for ext2
        ext2: Strengthen xattr block checks
        ext2: Merge loops in ext2_xattr_set()
        ext2: introduce helper for xattr entry validation
        ext2: introduce helper for xattr header validation
        quota: add dqi_dirty_list description to comment of Dquot List Management
      682f7c5c
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · e6983afd
      Linus Torvalds authored
      Pull fsnotify updates from Jan Kara:
       "This contains cleanups of the fsnotify name removal hook and also a
        patch to disable fanotify permission events for 'proc' filesystem"
      
      * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: get rid of fsnotify_nameremove()
        fsnotify: move fsnotify_nameremove() hook out of d_delete()
        configfs: call fsnotify_rmdir() hook
        debugfs: call fsnotify_{unlink,rmdir}() hooks
        debugfs: simplify __debugfs_remove_file()
        devpts: call fsnotify_unlink() hook
        tracefs: call fsnotify_{unlink,rmdir}() hooks
        rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
        btrfs: call fsnotify_rmdir() hook
        fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
        fanotify: Disallow permission events for proc filesystem
      e6983afd
    • Linus Torvalds's avatar
      Merge tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux · 988052f4
      Linus Torvalds authored
      Pull file locking updates from Jeff Layton:
       "Just a couple of small lease-related patches this cycle.
      
        One from Ira to add a new tracepoint that fires during lease conflict
        checks, and another patch from Amir to reduce false positives when
        checking for lease conflicts"
      
      * tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
        locks: eliminate false positive conflicts for write lease
        locks: Add trace_leases_conflict
      988052f4
    • Linus Torvalds's avatar
      Revert "Merge tag 'keys-acl-20190703' of... · 028db3e2
      Linus Torvalds authored
      Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs"
      
      This reverts merge 0f75ef6a (and thus
      effectively commits
      
         7a1ade84 ("keys: Provide KEYCTL_GRANT_PERMISSION")
         2e12256b ("keys: Replace uid/gid/perm permissions checking with an ACL")
      
      that the merge brought in).
      
      It turns out that it breaks booting with an encrypted volume, and Eric
      biggers reports that it also breaks the fscrypt tests [1] and loading of
      in-kernel X.509 certificates [2].
      
      The root cause of all the breakage is likely the same, but David Howells
      is off email so rather than try to work it out it's getting reverted in
      order to not impact the rest of the merge window.
      
       [1] https://lore.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/
       [2] https://lore.kernel.org/lkml/20190710013225.GB7973@sol.localdomain/
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjxoeMJfeBahnWH=9zShKp2bsVy527vo3_y8HfOdhwAAw@mail.gmail.com/Reported-by: default avatarEric Biggers <ebiggers@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      028db3e2
  3. 10 Jul, 2019 9 commits
  4. 09 Jul, 2019 6 commits