Commits · 3c55d6bcfe8163ff2b5636b4aabe3caa3f5d95f4 · Kirill Smelkov / linux

16 Dec, 2016 2 commits
- Merge remote-tracking branch 'djwong/ocfs2-vfs-reflink-6' into for-linus · 3c55d6bc
  Al Viro authored Dec 16, 2016
  
  3c55d6bc
- Merge branch 'work.write_end' into for-linus · 4da00fd1
  Al Viro authored Dec 16, 2016
  
  4da00fd1
10 Dec, 2016 17 commits

ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features · 29ac8e85

Darrick J. Wong authored Nov 09, 2016

Connect the new VFS clone_range, copy_range, and dedupe_range features
to the existing reflink capability of ocfs2.  Compared to the existing
ocfs2 reflink ioctl We have to do things a little differently to support
the VFS semantics (we can clone subranges of a file but we don't clone
xattrs), but the VFS ioctls are more broadly supported.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Convert inline data files to extents files before reflinking,
and fix i_blocks so that stat(2) output is correct.
v3: Make zero-length dedupe consistent with btrfs behavior.
v4: Use VFS double-inode lock routines and remove MAX_DEDUPE_LEN.

29ac8e85

ocfs2: charge quota for reflinked blocks · 86e59436

Darrick J. Wong authored Nov 22, 2016

When ocfs2 shares blocks from one file to another, it's necessary to
charge that many blocks to the quota because ocfs2 tallies block charges
according to the number of blocks mapped, not the number of physical
blocks used.

Without this patch, reflinking X blocks and then CoWing all of them
causes quota usage to *decrease* by X as seen in generic/305.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

86e59436

ocfs2: fix bad pointer cast · aef73a61

Darrick J. Wong authored Dec 09, 2016

generic/188 triggered a dmesg stack trace because the dio completion
was casting a buffer head to an on-disk inode, which is whacky.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

aef73a61

ocfs2: always unlock when completing dio writes · dbf896fc

Darrick J. Wong authored Dec 01, 2016

Always unlock the inode when completing dio writes, even if an error
has occurrred.  The caller already checks the inode and unlocks it
if needed, so we might as well reduce contention.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

dbf896fc

ocfs2: don't eat io errors during _dio_end_io_write · 08554955

Darrick J. Wong authored Nov 09, 2016

ocfs2_dio_end_io_write eats whatever errors may happen,
which means that write errors do not propagate to userspace.
Fix that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

08554955

ocfs2: budget for extent tree splits when adding refcount flag · 3e10b793

Darrick J. Wong authored Nov 09, 2016

When we're adding the refcount flag to an extent, we have to budget
enough space to handle a full extent btree split in addition to
whatever modifications have to be made to the refcount btree.  We
don't currently do this, with the result that generic/186 crashes
when we need an extent split but not a refcount split because meta_ac
never gets allocated.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

3e10b793

ocfs2: prohibit refcounted swapfiles · 06a70305

Darrick J. Wong authored Nov 09, 2016

The swapfile mechanism calls bmap once to find all the swap file
mappings, which means that we cannot properly support CoW remapping.
Therefore, error out if the swap code tries to call bmap on a
refcounted file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

06a70305

ocfs2: add newlines to some error messages · 86544fbd

Darrick J. Wong authored Nov 09, 2016

These two error messages are missing the trailing newline.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

86544fbd

ocfs2: convert inode refcount test to a helper · 84e40080

Darrick J. Wong authored Nov 09, 2016

Replace the open-coded inode refcount flag test with a helper function
to reduce the potential for bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

84e40080

simple_write_end(): don't zero in short copy into uptodate · 04fff641
Al Viro authored Aug 29, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
04fff641

exofs: don't mess with simple_write_{begin,end} · 92e50d2d

Al Viro authored Aug 29, 2016

... and don't zero anything on short copy; just unlock
and return 0 if that has happened on non-uptodate page.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

92e50d2d

9p: saner ->write_end() on failing copy into non-uptodate page · 77469c3f

Al Viro authored Aug 29, 2016

If we had a short copy into an uptodate page, there's no reason
whatsoever to zero anything; OTOH, if that page had _not_ been
uptodate, we must have been trying to overwrite it completely
and got a short copy.  In that case, overwriting the end with
zeroes, marking uptodate and sending to server is just plain
wrong.  Just unlock, keep it non-uptodate and return 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

77469c3f

fix gfs2_stuffed_write_end() on short copies · 43388b21

Al Viro authored Sep 05, 2016

a) the page is uptodate - ->write_begin() would either fail (in which
case we don't reach ->write_end()), or unstuff the inode, or find the
page already uptodate, or do a successful call of stuffed_readpage(),
which would've made it uptodate

b) zeroing the tail in pagecache is wrong.  kill -9 at the right time
while writing unmodified file contents to the same file should _not_
leave us in a situation when read() from the file will be reporting
it full of zeroes.  Especially since that effect will be transient -
at some later point the page will be evicted and then we'll be back
to the real file contents.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

43388b21

fix ceph_write_end() · b9de313c

Al Viro authored Sep 05, 2016

don't zero on short copies; if the page was uptodate it's just plain
wrong, and if it wasn't we'll be better off just returning 0 and
buggering off.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b9de313c

nfs_write_end(): fix handling of short copies · c0cf3ef5

Al Viro authored Sep 05, 2016

What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied.  As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c0cf3ef5

vfs: refactor clone/dedupe_file_range common functions · 876bec6f

Darrick J. Wong authored Dec 09, 2016

Hoist both the XFS reflink inode state and preparation code and the XFS
file blocks compare functions into the VFS so that ocfs2 can take
advantage of it for reflink and dedupe.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

876bec6f

fs: try to clone files first in vfs_copy_file_range · a76b5b04

Christoph Hellwig authored Dec 09, 2016

A clone is a perfectly fine implementation of a file copy, so most
file systems just implement the copy that way. Instead of duplicating
this logic move it to the VFS. Currently btrfs and XFS implement copies
the same way as clones and there is no behavior change for them, cifs
only implements clones and grow support for copy_file_range with this
patch. NFS implements both, so this will allow copy_file_range to work
on servers that only implement CLONE and be lot more efficient on servers
that implements CLONE and COPY.
Signed-off-by: Christoph Hellwig <hch@lst.de>

a76b5b04

06 Dec, 2016 8 commits
- vfs: misc struct path constification · f0bb5aaf
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  f0bb5aaf
- namespace.c: constify struct path passed to a bunch of primitives · ca71cf71
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  ca71cf71
- quota: constify struct path in quota_on · 8c54ca9c
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  8c54ca9c
- constify alloc_file() · a4141d7c
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  a4141d7c
- constify btrfs_mksubvol() · 92872094
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  92872094
- autofs: constify find_autofs_mount() callback · 5b5577e4
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  5b5577e4
- constify get_dcookie() and friends · 71215a75
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  71215a75
- audit_log_{name,link_denied}: constify struct path · 8bd10763
  Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  8bd10763
05 Dec, 2016 5 commits

fsnotify: constify the places working with ->f_path · 40212d53
Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
40212d53
constify fsnotify_parent() · 12c7f9dc
Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
12c7f9dc
fsnotify(): constify 'data' · e637835e
Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e637835e
fsnotify: constify 'data' passed to ->handle_event() · 3cd5eca8
Al Viro authored Nov 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3cd5eca8

fs: Constify path_is_under()'s arguments · 640eb7e7

Mickaël Salaün authored Nov 14, 2016

The function path_is_under() doesn't modify the paths pointed by its
arguments but only browse them. Constifying this pointers make a cleaner
interface to be used by (future) code which may only have access to
const struct path pointers (e.g. LSM hooks).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

640eb7e7

04 Dec, 2016 2 commits

Linux 4.9-rc8 · 3e5de27e
Linus Torvalds authored Dec 04, 2016

3e5de27e

Merge tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux · 0cb65c83

Linus Torvalds authored Dec 03, 2016

Pull drm fixes from Dave Airlie:
 "A pretty small pull request: a couple of AMD powerxpress regression
  fixes and a power management fix, a couple of i915 fixes and one hdlcd
  fix, along with one core don't oops because of incorrect API usage fix"

* tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: drop the struct_mutex when wedged or trying to reset
  drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
  drm: Don't call drm_for_each_crtc with a non-KMS driver
  drm/radeon: fix check for port PM availability
  drm/amdgpu: fix check for port PM availability
  drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr
  drm: hdlcd: Fix cleanup order

0cb65c83

03 Dec, 2016 4 commits

Merge tag 'drm-intel-fixes-2016-12-01' of... · ab7cd8d8

Dave Airlie authored Dec 04, 2016

Merge tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes

2 intel fixes.

* tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: drop the struct_mutex when wedged or trying to reset
  drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error

ab7cd8d8

Merge branch 'akpm' (patches from Andrew) · 3c49de52

Linus Torvalds authored Dec 02, 2016

Merge more fixes from Andrew Morton:
 "2 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, vmscan: add cond_resched() into shrink_node_memcg()
  mm: workingset: fix NULL ptr in count_shadow_nodes

3c49de52

mm, vmscan: add cond_resched() into shrink_node_memcg() · bd041733

Michal Hocko authored Dec 02, 2016

Boris Zhmurov has reported RCU stalls during the kswapd reclaim:

  INFO: rcu_sched detected stalls on CPUs/tasks:
   23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23
   (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
  Task dump for CPU 23:
  kswapd1         R  running task        0   148      2 0x00000008
  Call Trace:
    shrink_node+0xd2/0x2f0
    kswapd+0x2cb/0x6a0
    mem_cgroup_shrink_node+0x160/0x160
    kthread+0xbd/0xe0
    __switch_to+0x1fa/0x5c0
    ret_from_fork+0x1f/0x40
    kthread_create_on_node+0x180/0x180

a closer code inspection has shown that we might indeed miss all the
scheduling points in the reclaim path if no pages can be isolated from
the LRU list.  This is a pathological case but other reports from Donald
Buczek have shown that we might indeed hit such a path:

        clusterd-989   [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
         kswapd1-86    [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1
         kswapd1-86    [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1
  [...]
         kswapd1-86    [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1

this is minute long snapshot which didn't take a single page from the
LRU.  It is not entirely clear why only 1303 pages have been scanned
during that time (maybe there was a heavy IRQ activity interfering).

In any case it looks like we can really hit long periods without
scheduling on non preemptive kernels so an explicit cond_resched() in
shrink_node_memcg which is independent on the reclaim operation is due.

Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Boris Zhmurov <bb@kernelpanic.ru>
Tested-by: Boris Zhmurov <bb@kernelpanic.ru>
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Reported-by: "Christopher S. Aker" <caker@theshore.net>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bd041733

mm: workingset: fix NULL ptr in count_shadow_nodes · 20ab67a5

Michal Hocko authored Dec 02, 2016

Commit 0a6b76dd ("mm: workingset: make shadow node shrinker memcg
aware") has made the workingset shadow nodes shrinker memcg aware.  The
implementation is not correct though because memcg_kmem_enabled() might
become true while we are doing a global reclaim when the sc->memcg might
be NULL which is exactly what Marek has seen:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
  IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
  PGD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 60 Comm: kswapd0 Tainted: G           O   4.8.10-12.pvops.qubes.x86_64 #1
  task: ffff880011863b00 task.stack: ffff880011868000
  RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40
  RSP: e02b:ffff88001186bc70  EFLAGS: 00010293
  RAX: 0000000000000000 RBX: ffff88001186bd20 RCX: 0000000000000002
  RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88001186bc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000
  R10: 0000000000006c34 R11: 0000000000000333 R12: 00000000000001f6
  R13: ffffffff81c6f6a0 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff880013c00000(0000) knlGS:ffff880013d00000
  CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000400 CR3: 00000000122f2000 CR4: 0000000000042660
  Call Trace:
    count_shadow_nodes+0x9a/0xa0
    shrink_slab.part.42+0x119/0x3e0
    shrink_node+0x22c/0x320
    kswapd+0x32c/0x700
    kthread+0xd8/0xf0
    ret_from_fork+0x1f/0x40
  Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1
  RIP  mem_cgroup_node_nr_lru_pages+0x20/0x40
   RSP <ffff88001186bc70>
  CR2: 0000000000000400
  ---[ end trace 100494b9edbdfc4d ]---

This patch fixes the issue by checking sc->memcg rather than
memcg_kmem_enabled() which is sufficient because shrink_slab makes sure
that only memcg aware shrinkers will get non-NULL memcgs and only if
memcg_kmem_enabled is true.

Fixes: 0a6b76dd ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Tested-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

20ab67a5

02 Dec, 2016 2 commits

kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabled · 86556392

Nicolas Pitre authored Dec 02, 2016

When building a specific target such as bzImage, modules aren't normally
built.  However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules
means none of the exported symbols are used and therefore they will all
be trimmed away from the final kernel.  A subsequent "make modules" will
fail because modpost cannot find the needed symbols for those modules in
the kernel binary.

Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS
is enabled and that the kernel binary is properly rebuilt accordingly.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

86556392

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8dc0f265

Linus Torvalds authored Dec 02, 2016

Pull ARM SoC fixes from Arnd Bergmann:
 "This should be the last set of bugfixes for arm-soc in v4.9. None of
  these are critical regressions, but it would be nice to still get them
  merged.

   - On the Juno platform, the idle latency was described wrong, leading
     to suboptimal cpuidle tuning.

   - Also on the same platform, PCI I/O space was set up incorrectly and
     could not work.

   - On the sti platform, a syntactically incorrect DT entry caused
     warnings.

   - The newly added 'gr8' platform has somewhat confusing file names,
     which we rename for consistency"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
  arm64: dts: juno: Correct PCI IO window
  ARM: dts: STiH407-family: fix i2c nodes
  ARM: gr8: Rename the DTSI and relevant DTS

8dc0f265