1. 15 Jan, 2016 40 commits
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio · 37cea93b
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
      
       - Fixes in AMD xgbe reset, spapr structure padding, type 1 flags (Dan
         Carpenter, Alexey Kardashevskiy, Pierre Morel)
      
       - Re-introduce no-iommu mode, with a user this time (Alex Williamson)
      
      * tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio:
        vfio/iommu_type1: make use of info.flags
        vfio: Include No-IOMMU mode
        vfio: Add explicit alignments in vfio_iommu_spapr_tce_create
        VFIO: platform: reset: fix a warning message condition
      37cea93b
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux · cc80fe0e
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "Smaller bugfixes and cleanup, including a fix for a failures of
        kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
        storms that can affect some high-availability NFS setups"
      
      * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
        nfsd: add new io class tracepoint
        nfsd: give up on CB_LAYOUTRECALLs after two lease periods
        nfsd: Fix nfsd leaks sunrpc module references
        lockd: constify nlmsvc_binding structure
        lockd: use to_delayed_work
        nfsd: use to_delayed_work
        Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
        lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
        nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
        sunrpc: Add a function to close temporary transports immediately
        nfsd: don't base cl_cb_status on stale information
        nfsd4: fix gss-proxy 4.1 mounts for some AD principals
        nfsd: fix unlikely NULL deref in mach_creds_match
        nfsd: minor consolidation of mach_cred handling code
        nfsd: helper for dup of possibly NULL string
        svcrpc: move some initialization to common code
        nfsd: fix a warning message
        nfsd: constify nfsd4_callback_ops structure
        nfsd: recover: constify nfsd4_client_tracking_ops structures
        svcrdma: Do not send XDR roundup bytes for a write chunk
      cc80fe0e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · c7b6c5fe
      Linus Torvalds authored
      Pull vfs regression fix from Al Viro:
       "Fix for braino introduced in vfs.git#work.misc"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        amdkfd: Copy from the proper user command pointer
      c7b6c5fe
    • Linus Torvalds's avatar
      Merge tag 'md/4.5' of git://neil.brown.name/md · 3c28c9cc
      Linus Torvalds authored
      Pull md updates from Neil Brown:
       "Mostly clustered-raid1 and raid5 journal updates.  one Y2038 fix and
        other minor stuff.
      
        One patch removes me from the MAINTAINERS file and adds a record of my
        md maintainership to Credits"
      
      Many thanks to Neil, who has been around for a _looong_ time.
      
      * tag 'md/4.5' of git://neil.brown.name/md: (26 commits)
        md/raid: only permit hot-add of compatible integrity profiles
        Remove myself as MD Maintainer, and add to Credits.
        raid5-cache: handle journal hotadd in quiesce
        MD: add journal with array suspended
        md: set MD_HAS_JOURNAL in correct places
        md: Remove 'ready' field from mddev.
        md: remove unnecesary md_new_event_inintr
        raid5: allow r5l_io_unit allocations to fail
        raid5-cache: use a mempool for the metadata block
        raid5-cache: use a bio_set
        raid5-cache: add journal hot add/remove support
        drivers: md: use ktime_get_real_seconds()
        md: avoid warning for 32-bit sector_t
        raid5-cache: free meta_page earlier
        raid5-cache: simplify r5l_move_io_unit_list
        md: update comment for md_allow_write
        md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY
        md-cluster: Protect communication with mutexes
        md-cluster: Defer MD reloading to mddev->thread
        md-cluster: update the documentation
        ...
      3c28c9cc
    • Linus Torvalds's avatar
      Merge tag 'regulator-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 4b43ea2a
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "Aside from a fix for a spurious warning (which caused more problems
        than it fixed in the fixing really) this is all driver updates,
        including new drivers for Dialog PV88060/90 and TI LM363x and TPS65086
        devices.  The qcom_smd driver has had PM8916 and PMA8084 support
        added"
      
      * tag 'regulator-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (36 commits)
        regulator: core: remove some dead code
        regulator: core: use dev_to_rdev
        regulator: lp872x: Get rid of duplicate reference to DVS GPIO
        regulator: lp872x: Add missing of_match in regulators descriptions
        regulator: axp20x: Fix GPIO LDO enable value for AXP22x
        regulator: lp8788: constify regulator_ops structures
        regulator: wm8*: constify regulator_ops structures
        regulator: da9*: constify regulator_ops structures
        regulator: mt6311: Use REGCACHE_RBTREE
        regulator: tps65917/palmas: Add bypass ops for LDOs with bypass capability
        regulator: qcom-smd: Add support for PMA8084
        regulator: qcom-smd: Add PM8916 support
        soc: qcom: documentation: Update SMD/RPM Docs
        regulator: pv88090: logical vs bitwise AND typo
        regulator: pv88090: Fix irq leak
        regulator: pv88090: new regulator driver
        regulator: wm831x-ldo: Use platform_register/unregister_drivers()
        regulator: wm831x-dcdc: Use platform_register/unregister_drivers()
        regulator: lp8788-ldo: Use platform_register/unregister_drivers()
        regulator: core: Fix nested locking of supplies
        ...
      4b43ea2a
    • Borislav Petkov's avatar
      amdkfd: Copy from the proper user command pointer · 39c01bf9
      Borislav Petkov authored
      8f1d57c1 ("amdkfd: don't open-code memdup_user()") mistakenly uses
      an uninitialized local pointer, gcc complains:
      
        drivers/gpu/drm/amd/amdkfd/kfd_chardev.c: In function ‘kfd_ioctl_dbg_address_watch’:
        drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:562:12: warning: ‘args_buff’ may be used uninitialized in this function [-Wmaybe-uninitialized]
          args_buff = memdup_user(args_buff,
                      ^
      
      Fix it.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      39c01bf9
    • Linus Torvalds's avatar
      Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 7aca74e7
      Linus Torvalds authored
      Pull mailbox fixlet from Jussi Brar.
      
      * 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: constify mbox_chan_ops structure
      7aca74e7
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 1d3671df
      Linus Torvalds authored
      Pull UDF fixes and quota cleanups from Jan Kara:
       "Several UDF fixes and some minor quota cleanups"
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        udf: Check output buffer length when converting name to CS0
        udf: Prevent buffer overrun with multi-byte characters
        quota: constify qtree_fmt_operations structures
        udf: avoid uninitialized variable use
        udf: Fix lost indirect extent block
        udf: Factor out code for creating indirect extent
        udf: limit the maximum number of indirect extents in a row
        udf: limit the maximum number of TD redirections
        fs: make quota/dquot.c explicitly non-modular
        fs: make quota/netlink.c explicitly non-modular
      1d3671df
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 875fc4f5
      Linus Torvalds authored
      Merge first patch-bomb from Andrew Morton:
      
       - A few hotfixes which missed 4.4 becasue I was asleep.  cc'ed to
         -stable
      
       - A few misc fixes
      
       - OCFS2 updates
      
       - Part of MM.  Including pretty large changes to page-flags handling
         and to thp management which have been buffered up for 2-3 cycles now.
      
        I have a lot of MM material this time.
      
      [ It turns out the THP part wasn't quite ready, so that got dropped from
        this series  - Linus ]
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
        zsmalloc: reorganize struct size_class to pack 4 bytes hole
        mm/zbud.c: use list_last_entry() instead of list_tail_entry()
        zram/zcomp: do not zero out zcomp private pages
        zram: pass gfp from zcomp frontend to backend
        zram: try vmalloc() after kmalloc()
        zram/zcomp: use GFP_NOIO to allocate streams
        mm: add tracepoint for scanning pages
        drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
        mm/page_isolation: use macro to judge the alignment
        mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
        mm: rework virtual memory accounting
        include/linux/memblock.h: fix ordering of 'flags' argument in comments
        mm: move lru_to_page to mm_inline.h
        Documentation/filesystems: describe the shared memory usage/accounting
        memory-hotplug: don't BUG() in register_memory_resource()
        hugetlb: make mm and fs code explicitly non-modular
        mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
        mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
        mm: make sure isolate_lru_page() is never called for tail page
        vmstat: make vmstat_updater deferrable again and shut down on idle
        ...
      875fc4f5
    • Weijie Yang's avatar
      zsmalloc: reorganize struct size_class to pack 4 bytes hole · 7dfa4612
      Weijie Yang authored
      Reoder the pages_per_zspage field in struct size_class which can
      eliminate the 4 bytes hole between it and stats field.
      Signed-off-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7dfa4612
    • Geliang Tang's avatar
      mm/zbud.c: use list_last_entry() instead of list_tail_entry() · f58fb5e7
      Geliang Tang authored
      list_last_entry*( has been defined in list.h, so replace
      list_tail_entry() with it.
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f58fb5e7
    • Sergey Senozhatsky's avatar
      zram/zcomp: do not zero out zcomp private pages · e02d238c
      Sergey Senozhatsky authored
      Do not __GFP_ZERO allocated zcomp ->private pages.  We keep allocated
      streams around and use them for read/write requests, so we supply a
      zeroed out ->private to compression algorithm as a scratch buffer only
      once -- the first time we use that stream.  For the rest of IO requests
      served by this stream ->private usually contains some temporarily data
      from the previous requests.
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e02d238c
    • Minchan Kim's avatar
      zram: pass gfp from zcomp frontend to backend · 75d8947a
      Minchan Kim authored
      Each zcomp backend uses own gfp flag but it's pointless because the
      context they could be called is driven by upper layer(ie, zcomp
      frontend).  As well, zcomp frondend could call them in different
      context.  One context(ie, zram init part) is it should be better to make
      sure successful allocation other context(ie, further stream allocation
      part for accelarating I/O speed) is just optional so let's pass gfp down
      from driver (ie, zcomp frontend) like normal MM convention.
      
      [sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75d8947a
    • Kyeongdon Kim's avatar
      zram: try vmalloc() after kmalloc() · d913897a
      Kyeongdon Kim authored
      When we're using LZ4 multi compression streams for zram swap, we found
      out page allocation failure message in system running test.  That was
      not only once, but a few(2 - 5 times per test).  Also, some failure
      cases were continually occurring to try allocation order 3.
      
      In order to make parallel compression private data, we should call
      kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
      2/3 size memory to allocate in that time, page allocation fails.  This
      patch makes to use vmalloc() as fallback of kmalloc(), this prevents
      page alloc failure warning.
      
      After using this, we never found warning message in running test, also
      It could reduce process startup latency about 60-120ms in each case.
      
      For reference a call trace :
      
          Binder_1: page allocation failure: order:3, mode:0x10c0d0
          CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
          Call trace:
            dump_backtrace+0x0/0x270
            show_stack+0x10/0x1c
            dump_stack+0x1c/0x28
            warn_alloc_failed+0xfc/0x11c
            __alloc_pages_nodemask+0x724/0x7f0
            __get_free_pages+0x14/0x5c
            kmalloc_order_trace+0x38/0xd8
            zcomp_lz4_create+0x2c/0x38
            zcomp_strm_alloc+0x34/0x78
            zcomp_strm_multi_find+0x124/0x1ec
            zcomp_strm_find+0xc/0x18
            zram_bvec_rw+0x2fc/0x780
            zram_make_request+0x25c/0x2d4
            generic_make_request+0x80/0xbc
            submit_bio+0xa4/0x15c
            __swap_writepage+0x218/0x230
            swap_writepage+0x3c/0x4c
            shrink_page_list+0x51c/0x8d0
            shrink_inactive_list+0x3f8/0x60c
            shrink_lruvec+0x33c/0x4cc
            shrink_zone+0x3c/0x100
            try_to_free_pages+0x2b8/0x54c
            __alloc_pages_nodemask+0x514/0x7f0
            __get_free_pages+0x14/0x5c
            proc_info_read+0x50/0xe4
            vfs_read+0xa0/0x12c
            SyS_read+0x44/0x74
          DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
               0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
      
      [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
      [sergey.senozhatsky@gmail.com: tweak comments and styles]
      Signed-off-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d913897a
    • Sergey Senozhatsky's avatar
      zram/zcomp: use GFP_NOIO to allocate streams · 3d5fe03a
      Sergey Senozhatsky authored
      We can end up allocating a new compression stream with GFP_KERNEL from
      within the IO path, which may result is nested (recursive) IO
      operations.  That can introduce problems if the IO path in question is a
      reclaimer, holding some locks that will deadlock nested IOs.
      
      Allocate streams and working memory using GFP_NOIO flag, forbidding
      recursive IO and FS operations.
      
      An example:
      
        inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
        git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
        {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x8da/0x117b
           lock_acquire+0x10c/0x1a7
           start_this_handle+0x52d/0x555
           jbd2__journal_start+0xb4/0x237
           __ext4_journal_start_sb+0x108/0x17e
           ext4_dirty_inode+0x32/0x61
           __mark_inode_dirty+0x16b/0x60c
           iput+0x11e/0x274
           __dentry_kill+0x148/0x1b8
           shrink_dentry_list+0x274/0x44a
           prune_dcache_sb+0x4a/0x55
           super_cache_scan+0xfc/0x176
           shrink_slab.part.14.constprop.25+0x2a2/0x4d3
           shrink_zone+0x74/0x140
           kswapd+0x6b7/0x930
           kthread+0x107/0x10f
           ret_from_fork+0x3f/0x70
        irq event stamp: 138297
        hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
        hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
        softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
        softirqs last disabled at (137813):  irq_exit+0x41/0x95
      
                     other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(jbd2_handle);
          <Interrupt>
            lock(jbd2_handle);
      
                      *** DEADLOCK ***
        5 locks held by git/20158:
         #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
         #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
         #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
         #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
         #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
      
                     stack backtrace:
        CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
        Call Trace:
          dump_stack+0x4c/0x6e
          mark_lock+0x384/0x56d
          mark_held_locks+0x5f/0x76
          lockdep_trace_alloc+0xb2/0xb5
          kmem_cache_alloc_trace+0x32/0x1e2
          zcomp_strm_alloc+0x25/0x73 [zram]
          zcomp_strm_multi_find+0xe7/0x173 [zram]
          zcomp_strm_find+0xc/0xe [zram]
          zram_bvec_rw+0x2ca/0x7e0 [zram]
          zram_make_request+0x1fa/0x301 [zram]
          generic_make_request+0x9c/0xdb
          submit_bio+0xf7/0x120
          ext4_io_submit+0x2e/0x43
          ext4_bio_write_page+0x1b7/0x300
          mpage_submit_page+0x60/0x77
          mpage_map_and_submit_buffers+0x10f/0x21d
          ext4_writepages+0xc8c/0xe1b
          do_writepages+0x23/0x2c
          __filemap_fdatawrite_range+0x84/0x8b
          filemap_flush+0x1c/0x1e
          ext4_alloc_da_blocks+0xb8/0x117
          ext4_rename+0x132/0x6dc
          ? mark_held_locks+0x5f/0x76
          ext4_rename2+0x29/0x2b
          vfs_rename+0x540/0x636
          SyS_renameat2+0x359/0x44d
          SyS_rename+0x1e/0x20
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
      [minchan@kernel.org: add stable mark]
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d5fe03a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 7d1fc01a
      Linus Torvalds authored
      Pull trivial tree updates from Jiri Kosina.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        floppy: make local variable non-static
        exynos: fixes an incorrect header guard
        dt-bindings: fixes some incorrect header guards
        cpufreq-dt: correct dead link in documentation
        cpufreq: ARM big LITTLE: correct dead link in documentation
        treewide: Fix typos in printk
        Documentation: filesystem: Fix typo in fs/eventfd.c
        fs/super.c: use && instead of & for warn_on condition
        Documentation: fix sysfs-ptp
        lib: scatterlist: fix Kconfig description
      7d1fc01a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 0f0836b7
      Linus Torvalds authored
      Pull livepatching updates from Jiri Kosina:
      
       - RO/NX attribute fixes for patch module relocations from Josh
         Poimboeuf.  As part of this effort, module.c has been cleaned up as
         well and livepatching is piggy-backing on this cleanup.  Rusty is OK
         with this whole lot going through livepatching tree.
      
       - symbol disambiguation support from Chris J Arges.  That series is
         also
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      
         but this came in only after I've alredy pushed out.  Didn't want to
         rebase because of that, hence I am mentioning it here.
      
       - symbol lookup fix from Miroslav Benes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Cleanup module page permission changes
        module: keep percpu symbols in module's symtab
        module: clean up RO/NX handling.
        module: use a structure to encapsulate layout.
        gcov: use within_module() helper.
        module: Use the same logic for setting and unsetting RO/NX
        livepatch: function,sympos scheme in livepatch sysfs directory
        livepatch: add sympos as disambiguator field to klp_reloc
        livepatch: add old_sympos as disambiguator field to klp_func
      0f0836b7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · c2848f2e
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
      
       - appoint Benjamin Tissoires as co-maintainer / designated reviewer
      
       - sysfs report_descriptor visibility fix for unclaimed devices, from
         Andy Lutomirski
      
       - suspend/resume fixes for Sony driver from Frank Praznik
      
       - IRQ deadlock fix from Ioan-Adrian Ratiu
      
       - hid-i2c fixes affecting (at least) Yoga 900 from Mika Westerberg and
         Srinivas Pandruvada
      
       - a lot of new device support (especially, but not limited to, Wacom)
         and assorted small misc fixes
      
       - almost complete G920 support; the only bit that is missing is
         switching the device to HID mode automatically; Simon Wood and Michal
         Maly are working on it.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (46 commits)
        Revert "INPUT: xpad: switch Logitech G920 Wheel into HID mode"
        HID: sensor-hub: Add quirk for Lenovo Yoga 900 with ITE Chips
        HID: Add new PID for Microchip Pick16F1454
        HID: wacom: Use correct report to query pen ID from INTUOSHT2 devices
        HID: i2c-hid: Prevent sending reports from racing with device reset
        HID: use kobj_to_dev()
        HID: wiimote: use dev_to_wii()
        HID: add a new helper to_hid_driver()
        HID: use to_hid_device()
        HID: move to_hid_device() to hid.h
        HID: usbhid: use to_usb_device
        HID: corsair: Convert to use module_hid_driver
        HID: input: ignore the battery in OKLICK Laser BTmouse
        HID: wacom: Fix pad button range for CINTIQ_COMPANION_2
        HID: wacom: Fix touchring value reporting
        HID: wacom: Report 'strip2' values in ABS_RY
        HID: wacom: Limit touchstrip data to 13 bits
        HID: wacom: bitwise vs logical ORs
        HID: wacom: Apply lowres quirk to BAMBOO_TOUCH devices
        HID: enable hid device to suspend/resume asynchronously
        ...
      c2848f2e
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 75f26df6
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Highlights include:
      
        Stable fixes:
         - Fix a regression in the SunRPC socket polling code
         - Fix the attribute cache revalidation code
         - Fix race in __update_open_stateid()
         - Fix an lo->plh_block_lgets imbalance in layoutreturn
         - Fix an Oopsable typo in ff_mirror_match_fh()
      
        Features:
         - pNFS layout recall performance improvements.
         - pNFS/flexfiles: Support server-supplied layoutstats sampling period
      
        Bugfixes + cleanups:
         - NFSv4: Don't perform cached access checks before we've OPENed the
           file
         - Fix starvation issues with background flushes
         - Reclaim writes should be flushed as unstable writes if there are
           already entries in the commit lists
         - Various bugfixes from Chuck to fix NFS/RDMA send queue ordering
           problems
         - Ensure that we propagate fatal layoutget errors back to the
           application
         - Fixes for sundry flexfiles layoutstats bugs
         - Fix files/flexfiles to not cache invalidated layouts in the DS
           commit buckets"
      
      * tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
        NFS: Fix a compile warning about unused variable in nfs_generic_pg_pgios()
        NFSv4: Fix a compile warning about no prototype for nfs4_ioctl()
        NFS: Use wait_on_atomic_t() for unlock after readahead
        SUNRPC: Fixup socket wait for memory
        NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
        NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
        NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
        NFSv4.1/pNFS: Fix a race in initiate_file_draining()
        NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
        NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
        NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
        NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
        NFS: Relax requirements in nfs_flush_incompatible
        NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
        NFS: Allow multiple commit requests in flight per file
        NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
        SUNRPC: Fix a missing break in rpc_anyaddr()
        pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
        NFS: Fix attribute cache revalidation
        NFS: Ensure we revalidate attributes before using execute_ok()
        ...
      75f26df6
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 63f729cb
      Linus Torvalds authored
      Pull vfs fix from Al Viro:
       "Don't put symlink bodies in pagecache into highmem"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Make sure that highmem pages are not added to symlink page cache
      63f729cb
    • Ebru Akagunduz's avatar
      mm: add tracepoint for scanning pages · 7d2eba05
      Ebru Akagunduz authored
      This patch series makes swapin readahead up to a certain number to gain
      more thp performance and adds tracepoint for khugepaged_scan_pmd,
      collapse_huge_page, __collapse_huge_page_isolate.
      
      This patch series was written to deal with programs that access most,
      but not all, of their memory after they get swapped out.  Currently
      these programs do not get their memory collapsed into THPs after the
      system swapped their memory out, while they would get THPs before
      swapping happened.
      
      This patch series was tested with a test program, it allocates 400MB of
      memory, writes to it, and then sleeps.  I force the system to swap out
      all.  Afterwards, the test program touches the area by writing and
      leaves a piece of it without writing.  This shows how much swap in
      readahead made by the patch.
      
      Test results:
      
                              After swapped out
      -------------------------------------------------------------------
                    | Anonymous | AnonHugePages | Swap      | Fraction  |
      -------------------------------------------------------------------
      With patch    | 90076 kB    | 88064 kB    | 309928 kB |    %99    |
      -------------------------------------------------------------------
      Without patch | 194068 kB | 192512 kB     | 205936 kB |    %99    |
      -------------------------------------------------------------------
      
                              After swapped in
      -------------------------------------------------------------------
                    | Anonymous | AnonHugePages | Swap      | Fraction  |
      -------------------------------------------------------------------
      With patch    | 201408 kB | 198656 kB     | 198596 kB |    %98    |
      -------------------------------------------------------------------
      Without patch | 292624 kB | 192512 kB     | 107380 kB |    %65    |
      -------------------------------------------------------------------
      
      This patch (of 3):
      
      Using static tracepoints, data of functions is recorded.  It is good to
      automatize debugging without doing a lot of changes in the source code.
      
      This patch adds tracepoint for khugepaged_scan_pmd, collapse_huge_page
      and __collapse_huge_page_isolate.
      
      [dan.carpenter@oracle.com: add a missing tab]
      Signed-off-by: default avatarEbru Akagunduz <ebru.akagunduz@gmail.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d2eba05
    • John Allen's avatar
      drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 · cb5490a5
      John Allen authored
      Fix a bug where a kernel warning is triggered when performing a memory
      hotplug on ppc64.  This warning may also occur on any architecture that
      uses the memory_probe_store interface.
      
        WARNING: at drivers/base/memory.c:200
        CPU: 9 PID: 13042 Comm: systemd-udevd Not tainted 4.4.0-rc4-00113-g0bd0f1e6-dirty #7
        NIP [c00000000055e034] pages_correctly_reserved+0x134/0x1b0
        LR [c00000000055e7f8] memory_subsys_online+0x68/0x140
        Call Trace:
          memory_subsys_online+0x68/0x140
          device_online+0xb4/0x120
          store_mem_state+0xb0/0x180
          dev_attr_store+0x34/0x60
          sysfs_kf_write+0x64/0xa0
          kernfs_fop_write+0x17c/0x1e0
          __vfs_write+0x40/0x160
          vfs_write+0xb8/0x200
          SyS_write+0x60/0x110
          system_call+0x38/0xd0
      
      The warning is triggered because there is a udev rule that automatically
      tries to online memory after it has been added.  The udev rule varies
      from distro to distro, but will generally look something like:
      
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      
      On any architecture that uses memory_probe_store to reserve memory, the
      udev rule will be triggered after the first section of the block is
      reserved and will subsequently attempt to online the entire block,
      interrupting the memory reservation process and causing the warning.
      This patch modifies memory_probe_store to add a block of memory with a
      single call to add_memory as opposed to looping through and adding each
      section individually.  A single call to add_memory is protected by the
      mem_hotplug mutex which will prevent the udev rule from onlining memory
      until the reservation of the entire block is complete.
      Signed-off-by: default avatarJohn Allen <jallen@linux.vnet.ibm.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb5490a5
    • Naoya Horiguchi's avatar
    • Joshua Clayton's avatar
      mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() · 543dfb2d
      Joshua Clayton authored
      Running sparse on drivers/staging/lustre results in dozens of warnings:
      include/linux/gfp.h:281:41: warning: odd constant _Bool cast (400000
      becomes 1)
      
      Use "!!" to explicitly convert to bool and get rid of the warning.
      Signed-off-by: default avatarJoshua Clayton <stillcompiling@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      543dfb2d
    • Konstantin Khlebnikov's avatar
      mm: rework virtual memory accounting · 84638335
      Konstantin Khlebnikov authored
      When inspecting a vague code inside prctl(PR_SET_MM_MEM) call (which
      testing the RLIMIT_DATA value to figure out if we're allowed to assign
      new @start_brk, @brk, @start_data, @end_data from mm_struct) it's been
      commited that RLIMIT_DATA in a form it's implemented now doesn't do
      anything useful because most of user-space libraries use mmap() syscall
      for dynamic memory allocations.
      
      Linus suggested to convert RLIMIT_DATA rlimit into something suitable
      for anonymous memory accounting.  But in this patch we go further, and
      the changes are bundled together as:
      
       * keep vma counting if CONFIG_PROC_FS=n, will be used for limits
       * replace mm->shared_vm with better defined mm->data_vm
       * account anonymous executable areas as executable
       * account file-backed growsdown/up areas as stack
       * drop struct file* argument from vm_stat_account
       * enforce RLIMIT_DATA for size of data areas
      
      This way code looks cleaner: now code/stack/data classification depends
      only on vm_flags state:
      
       VM_EXEC & ~VM_WRITE            -> code  (VmExe + VmLib in proc)
       VM_GROWSUP | VM_GROWSDOWN      -> stack (VmStk)
       VM_WRITE & ~VM_SHARED & !stack -> data  (VmData)
      
      The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called
      "shared", but that might be strange beast like readonly-private or VM_IO
      area.
      
       - RLIMIT_AS            limits whole address space "VmSize"
       - RLIMIT_STACK         limits stack "VmStk" (but each vma individually)
       - RLIMIT_DATA          now limits "VmData"
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Kees Cook <keescook@google.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84638335
    • Florian Fainelli's avatar
      include/linux/memblock.h: fix ordering of 'flags' argument in comments · d30b5545
      Florian Fainelli authored
      for_each_free_mem_range() and for_each_free_mem_range_reverse() both
      accept a 'flags' argument, the comment surrounding the macro placed the
      'flags' documentation at the very end, while 'flags' is in fact the 3rd
      argument to the macro, so let's preserve natural ordering here.
      
      Fixes: fc6daaf9 ("mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d30b5545
    • Geliang Tang's avatar
      mm: move lru_to_page to mm_inline.h · d72ee911
      Geliang Tang authored
      Move lru_to_page() from internal.h to mm_inline.h.
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d72ee911
    • Rodrigo Freire's avatar
      Documentation/filesystems: describe the shared memory usage/accounting · 0bc126d4
      Rodrigo Freire authored
      The Shared Memory accounting support is present in Kernel since commit
      4b02108a ("mm: oom analysis: add shmem vmstat") and in userland
      free(1) since 2014.  This patch updates the Documentation to reflect
      this change.
      Signed-off-by: default avatarRodrigo Freire <rfreire@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bc126d4
    • Vitaly Kuznetsov's avatar
      memory-hotplug: don't BUG() in register_memory_resource() · 6f754ba4
      Vitaly Kuznetsov authored
      Out of memory condition is not a bug and while we can't add new memory
      in such case crashing the system seems wrong.  Propagating the return
      value from register_memory_resource() requires interface change.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Sheng Yong <shengyong1@huawei.com>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f754ba4
    • Paul Gortmaker's avatar
      hugetlb: make mm and fs code explicitly non-modular · 3e89e1c5
      Paul Gortmaker authored
      The Kconfig currently controlling compilation of this code is:
      
      config HUGETLBFS
              bool "HugeTLB file system support"
      
      ...meaning that it currently is not being built as a module by anyone.
      
      Lets remove the modular code that is essentially orphaned, so that when
      reading the driver there is no doubt it is builtin-only.
      
      Since module_init translates to device_initcall in the non-modular case,
      the init ordering gets moved to earlier levels when we use the more
      appropriate initcalls here.
      
      Originally I had the fs part and the mm part as separate commits, just
      by happenstance of the nature of how I detected these non-modular use
      cases.  But that can possibly introduce regressions if the patch merge
      ordering puts the fs part 1st -- as the 0-day testing reported a splat
      at mount time.
      
      Investigating with "initcall_debug" showed that the delta was
      init_hugetlbfs_fs being called _before_ hugetlb_init instead of after.  So
      both the fs change and the mm change are here together.
      
      In addition, it worked before due to luck of link order, since they were
      both in the same initcall category.  So we now have the fs part using
      fs_initcall, and the mm part using subsys_initcall, which puts it one
      bucket earlier.  It now passes the basic sanity test that failed in
      earlier 0-day testing.
      
      We delete the MODULE_LICENSE tag and capture that information at the top
      of the file alongside author comments, etc.
      
      We don't replace module.h with init.h since the file already has that.
      Also note that MODULE_ALIAS is a no-op for non-modular code.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e89e1c5
    • Geliang Tang's avatar
      mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations · 0d576d20
      Geliang Tang authored
      Use list_for_each_entry_safe() instead of list_for_each_safe() to
      simplify the code.
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0d576d20
    • Oleg Nesterov's avatar
      mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() · 0e41e277
      Oleg Nesterov authored
      clear_soft_dirty_pmd() is called by clear_refs_write(CLEAR_REFS_SOFT_DIRTY),
      VM_SOFTDIRTY was already cleared before walk_page_range().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e41e277
    • Kirill A. Shutemov's avatar
      mm: make sure isolate_lru_page() is never called for tail page · bb5b8589
      Kirill A. Shutemov authored
      The VM_BUG_ON_PAGE() would catch such cases if any still exists.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb5b8589
    • Christoph Lameter's avatar
      vmstat: make vmstat_updater deferrable again and shut down on idle · 0eb77e98
      Christoph Lameter authored
      Currently the vmstat updater is not deferrable as a result of commit
      ba4877b9 ("vmstat: do not use deferrable delayed work for
      vmstat_update").  This in turn can cause multiple interruptions of the
      applications because the vmstat updater may run at
      
      Make vmstate_update deferrable again and provide a function that folds
      the differentials when the processor is going to idle mode thus
      addressing the issue of the above commit in a clean way.
      
      Note that the shepherd thread will continue scanning the differentials
      from another processor and will reenable the vmstat workers if it
      detects any changes.
      
      Fixes: ba4877b9 ("vmstat: do not use deferrable delayed work for vmstat_update")
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0eb77e98
    • Hugh Dickins's avatar
      memcg: avoid vmpressure oops when memcg disabled · 686739f6
      Hugh Dickins authored
      A CONFIG_MEMCG=y kernel booted with "cgroup_disable=memory" crashes on a
      NULL memcg (but non-NULL root_mem_cgroup) when vmpressure kicks in.
      Here's the patch I use to avoid that, but you might prefer a test on
      mem_cgroup_disabled() somewhere.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      686739f6
    • Johannes Weiner's avatar
      mm: memcontrol: switch to the updated jump-label API · ef12947c
      Johannes Weiner authored
      According to <linux/jump_label.h> the direct use of struct static_key is
      deprecated.  Update the socket and slab accounting code accordingly.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reported-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef12947c
    • Johannes Weiner's avatar
      mm: memcontrol: hook up vmpressure to socket pressure · 8e8ae645
      Johannes Weiner authored
      Let the networking stack know when a memcg is under reclaim pressure so
      that it can clamp its transmit windows accordingly.
      
      Whenever the reclaim efficiency of a cgroup's LRU lists drops low enough
      for a MEDIUM or HIGH vmpressure event to occur, assert a pressure state
      in the socket and tcp memory code that tells it to curb consumption
      growth from sockets associated with said control group.
      
      Traditionally, vmpressure reports for the entire subtree of a memcg
      under pressure, which drops useful information on the individual groups
      reclaimed.  However, it's too late to change the userinterface, so add a
      second reporting mode that reports on the level of reclaim instead of at
      the level of pressure, and use that report for sockets.
      
      vmpressure events are naturally edge triggered, so for hysteresis assert
      socket pressure for a second to allow for subsequent vmpressure events
      to occur before letting the socket code return to normal.
      
      This will likely need finetuning for a wider variety of workloads, but
      for now stick to the vmpressure presets and keep hysteresis simple.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e8ae645
    • Johannes Weiner's avatar
      mm: memcontrol: account socket memory in unified hierarchy memory controller · f7e1cb6e
      Johannes Weiner authored
      Socket memory can be a significant share of overall memory consumed by
      common workloads.  In order to provide reasonable resource isolation in
      the unified hierarchy, this type of memory needs to be included in the
      tracking/accounting of a cgroup under active memory resource control.
      
      Overhead is only incurred when a non-root control group is created AND
      the memory controller is instructed to track and account the memory
      footprint of that group.  cgroup.memory=nosocket can be specified on the
      boot commandline to override any runtime configuration and forcibly
      exclude socket memory from active memory resource control.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7e1cb6e
    • Johannes Weiner's avatar
      mm: memcontrol: move socket code for unified hierarchy accounting · 11092087
      Johannes Weiner authored
      The unified hierarchy memory controller will account socket memory.
      Move the infrastructure functions accordingly.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11092087
    • Johannes Weiner's avatar
      mm: memcontrol: do not account memory+swap on unified hierarchy · 7941d214
      Johannes Weiner authored
      The unified hierarchy memory controller doesn't expose the memory+swap
      counter to userspace, but its accounting is hardcoded in all charge
      paths right now, including the per-cpu charge cache ("the stock").
      
      To avoid adding yet more pointless memory+swap accounting with the
      socket memory support in unified hierarchy, disable the counter
      altogether when in unified hierarchy mode.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7941d214