Commits · 0c0a8de13f9612a663b050afa955e6668858d1eb · Kirill Smelkov / linux

25 May, 2016 8 commits

libceph: nuke unused fields and functions · 0c0a8de1

Ilya Dryomov authored Apr 28, 2016

Either unused or useless:

    osdmap->mkfs_epoch
    osd->o_marked_for_keepalive
    monc->num_generic_requests
    osdc->map_waiters
    osdc->last_requested_map
    osdc->timeout_tid

    osd_req_op_cls_response_data()

    osdmap_apply_incremental() @msgr arg
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

0c0a8de1

rbd: use header_oid instead of header_name · c41d13a3

Ilya Dryomov authored Apr 29, 2016

Switch to ceph_object_id and use ceph_oid_aprintf() instead of a bare
const char *.  This reduces noise in rbd_dev_header_name().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

c41d13a3

libceph: variable-sized ceph_object_id · d30291b9

Ilya Dryomov authored Apr 29, 2016

Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

d30291b9

libceph: change how osd_op_reply message size is calculated · 711da55d

Ilya Dryomov authored Apr 27, 2016

For a message pool message, preallocate a page, just like we do for
osd_op.  For a normal message, take ceph_object_id into account and
don't bother subtracting CEPH_OSD_SLAB_OPS ceph_osd_ops.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

711da55d

libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16

Ilya Dryomov authored Apr 27, 2016

The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

13d1ad16

libceph: grab snapc in ceph_osdc_alloc_request() · 84127282

Ilya Dryomov authored Apr 26, 2016

ceph_osdc_build_request() is going away.  Grab snapc and initialize
->r_snapid in ceph_osdc_alloc_request().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

84127282

libceph: make ceph_osdc_put_request() accept NULL · 3ed97d63
Ilya Dryomov authored Apr 26, 2016
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
3ed97d63

rbd: get/put img_request in rbd_img_request_submit() · 663ae2cc

Ilya Dryomov authored May 16, 2016

By the time we get to checking for_each_obj_request_safe(img_request)
terminating condition, all obj_requests may be complete and img_request
ref, that rbd_img_request_submit() takes away from its caller, may be
put.  Moving the next_obj_request cursor is then a use-after-free on
img_request.

It's totally benign, as the value that's read is never used, but
I think it's still worth fixing.

Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

663ae2cc

15 May, 2016 2 commits

Linux 4.6 · 2dcd0af5
Linus Torvalds authored May 15, 2016

2dcd0af5

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5f95063c

Linus Torvalds authored May 15, 2016

Pull x86 fix from Thomas Gleixner:
 "Just the missing compat entry for the new pread/writev2"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Use compat version for preadv2 and pwritev2

5f95063c

14 May, 2016 11 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 272911b8

Linus Torvalds authored May 14, 2016

Pull networking fixes from David Miller:

 1) Fix mvneta/bm dependencies, from Arnd Bergmann.

 2) RX completion hw bug workaround in bnxt_en, from Michael Chan.

 3) Kernel pointer leak in nf_conntrack, from Linus.

 4) Hoplimit route attribute limits not enforced properly, from Paolo
    Abeni.

 5) qlcnic driver NULL deref fix from Dan Carpenter.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  arm64: bpf: jit JMP_JSET_{X,K}
  net/route: enforce hoplimit max value
  nf_conntrack: avoid kernel pointer value leak in slab name
  drivers: net: xgene: fix register offset
  drivers: net: xgene: fix statistics counters race condition
  drivers: net: xgene: fix ununiform latency across queues
  drivers: net: xgene: fix sharing of irqs
  drivers: net: xgene: fix IPv4 forward crash
  xen-netback: fix extra_info handling in xenvif_tx_err()
  net: mvneta: bm: fix dependencies again
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)
  qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()

272911b8

arm64: bpf: jit JMP_JSET_{X,K} · 98397fc5

Zi Shen Lim authored May 12, 2016

Original implementation commit e54bcde3 ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.

As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.

With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.

...
[    2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
[    2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
[    2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
...
[    3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
[    3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
[    3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
[    3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
...

Fixes: e54bcde3 ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

98397fc5

net/route: enforce hoplimit max value · 626abd59

Paolo Abeni authored May 13, 2016

Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.

The caller can i.e. set hoplimit to 256, and when such route will
 be used, packets will be sent with hoplimit/ttl equal to 0.

This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.

This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

626abd59

nf_conntrack: avoid kernel pointer value leak in slab name · 31b0b385

Linus Torvalds authored May 14, 2016

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

31b0b385

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6ba5b85f

Linus Torvalds authored May 14, 2016

Pull vfs fixes from Al Viro:
 "Overlayfs fixes from Miklos, assorted fixes from me.

  Stable fodder of varying severity, all sat in -next for a while"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ignore permissions on underlying lookup
  vfs: add lookup_hash() helper
  vfs: rename: check backing inode being equal
  vfs: add vfs_select_inode() helper
  get_rock_ridge_filename(): handle malformed NM entries
  ecryptfs: fix handling of directory opening
  atomic_open(): fix the handling of create_error
  fix the copy vs. map logics in blk_rq_map_user_iov()
  do_splice_to(): cap the size before passing to ->splice_read()

6ba5b85f

Merge branch 'xgene-fixes' · b9150658

David S. Miller authored May 13, 2016

Iyappan Subramanian says:

====================
drivers: net: xgene: Bug fixes

This patch set addresses the following bug fixes that were found during testing.

  1. IPv4 forward test crash
    - drivers: net: xgene: fix IPv4 forward crash

  2. Sharing of irqs
    - drivers: net: xgene: fix sharing of irqs

  3. Ununiform latency across queues
    - drivers: net: xgene: fix ununiform latency across queues

  4. Fix statistics counters race condition
    - drivers: net: xgene: fix statistics counters race condition

  5. Correcting register offset and field lengths
    - drivers: net: xgene: fix register offset

v2: Address review comments from v1
- Defer TSO fix, and reposting all other patches from v1

v1:
- Initial version
====================
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9150658

drivers: net: xgene: fix register offset · e2f2d9a7

Iyappan Subramanian authored May 13, 2016

This patch fixes SG_RX_DV_GATE_REG_0_ADDR register offset
and ring state field lengths.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2f2d9a7

drivers: net: xgene: fix statistics counters race condition · 3bb502f8

Iyappan Subramanian authored May 13, 2016

This patch fixes the race condition on updating the statistics
counters by moving the counters to the ring structure.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3bb502f8

drivers: net: xgene: fix ununiform latency across queues · 1b090a48

Iyappan Subramanian authored May 13, 2016

This patch addresses ununiform latency across queues by adding
more queues to match with, upto number of CPU cores.

Also, number of interrupts are increased and the channel numbers
are reordered.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b090a48

drivers: net: xgene: fix sharing of irqs · 46a22d29

Iyappan Subramanian authored May 13, 2016

Since hardware doesn't allow sharing of interrupts,
this patch fixes the same by removing IRQF_SHARED flag.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46a22d29

drivers: net: xgene: fix IPv4 forward crash · b30cfd24

Iyappan Subramanian authored May 13, 2016

This patch fixes the crash observed during IPv4 forward test by
setting the drop field in the dbptr.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b30cfd24

13 May, 2016 17 commits

Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 1410b74e

Linus Torvalds authored May 13, 2016

Pull cgroup fixes from Tejun Heo:
 "During v4.6-rc1 cgroup namespace support was merged.  There is an
  issue where it's impossible to tell whether a given cgroup mount point
  is bind mounted or namespaced.  Serge has been working on the issue
  but it took longer than expected to resolve, so the late pull request.

  Given that it's a completely new feature and the patches don't touch
  anything else, the risk seems acceptable.  However, if this is too
  late, an alternative is plugging new cgroup ns creation for v4.6 and
  retrying for v4.7"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix compile warning
  kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
  cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
  kernfs_path_from_node_locked: don't overwrite nlen

1410b74e

Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · da922239

Linus Torvalds authored May 13, 2016

Pull workqueue fix from Tejun Heo:
 "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
  DOWN_PREPARE which can trigger a WARN_ON() in workqueue.

  The bug has been there for a very long time.  It only triggers if CPU
  down fails at a specific point and I don't think it has adverse
  effects other than the warning messages.  The fix is very low impact"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix rebind bound workers warning

da922239

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 65643e3a

Linus Torvalds authored May 13, 2016

Pull scheduler fix from Ingo Molnar:
 "This is a revert to fix an interactivity problem.

  The proper fixes for the problems that the reverted commit exposed are
  now in sched/core (consisting of 3 patches), but were too risky for
  v4.6 and will arrive in the v4.7 merge window"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "sched/fair: Fix fairness issue on migration"

65643e3a

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f7f4d43b

Linus Torvalds authored May 13, 2016

Pull perf fixes from Ingo Molnar:
 "An uncharacteristically large number of bugs popped up in the last
  week:

   - various tooling fixes, two crashes and build problems
   - two Intel PT fixes
   - an KNL uncore driver fix
   - an Intel PMU driver fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf stat: Fallback to user only counters when perf_event_paranoid > 1
  perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback()
  perf evsel: Improve EPERM error handling in open_strerror()
  tools lib traceevent: Do not reassign parg after collapse_tree()
  perf probe: Check if dwarf_getlocations() is available
  perf dwarf: Guard !x86_64 definitions under #ifdef else clause
  perf tools: Use readdir() instead of deprecated readdir_r()
  perf thread_map: Use readdir() instead of deprecated readdir_r()
  perf script: Use readdir() instead of deprecated readdir_r()
  perf tools: Use readdir() instead of deprecated readdir_r()
  perf/core: Disable the event on a truncated AUX record
  perf/x86/intel/pt: Generate PMI in the STOP region as well
  perf/x86: Fix undefined shift on 32-bit kernels
  perf/x86/msr: Fix SMI overflow
  perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform
  perf diff: Fix duplicated output column

f7f4d43b

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 18759462

Linus Torvalds authored May 13, 2016

Pull ARM SoC fixes from Arnd Bergmann:
 "Three more bug fixes for ARM SoCs this week:

   - The Atmel sama5d2 was registering the wrong NFC device type

   - On Atmel sam9x5, the power management controller had an incorrect
     register area size

   - On ARM64 Allwinner machine was not secting the generic irqchip
     code, causing build errors in some configurations"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
  arm64/sunxi: 4.6-rc1: Add dependency on generic irq chip
  ARM: dts: at91: sama5d2: use "atmel,sama5d3-nfc" compatible for nfc

18759462

Merge tag 'regulator-fix-v4.6-rc7' of... · c3548b73

Linus Torvalds authored May 13, 2016

Merge tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A small collection of driver specific fixes for the regulator
  subsysetem:

   - Fix handling of probe deferral for GPIO regulators

   - Fix a typo in the module alias for DA9053

   - Fix the definition of BUCK9 in the S2MPS11 driver.  This change
     looks larger than it is because an irregularity in the hardware
     means that the macro used to define bucks 6-10 needs duplicating
     and tweaking to have a separate macro for 9

   - Fix a series of errors in the definitions of the LDOs the AXP20x
     regulators, some of which had always been present and some of which
     were introduced in the merge window"

* tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: da9063: Correct module alias prefix to fix module autoloading
  regulator: axp20x: Fix axp22x ldo_io registration error on cold boot
  regulator: axp20x: Fix axp22x ldo_io voltage ranges
  regulator: axp20x: Fix LDO4 linear voltage range
  regulator: s2mps11: Fix invalid selector mask and voltages for buck9
  regulator: gpio: check return value of of_get_named_gpio

c3548b73

Merge tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · c42b8fe9

Linus Torvalds authored May 13, 2016

Pull regmap fixes from Mark Brown:
 "This is rather too late so it'd be completely understandable if you
  don't want to pull it at this point, I had thought I'd sent this
  earlier but it seems I didn't.  Everything has been in -next for some
  time now.

  The main set of fixes here are mopping up some more issues with MMIO,
  fixing handling of endianness configuration in DT (which just wasn't
  working at all) and cases where the register and value endianness are
  different.

  There is also a fix for bulk register reads on SPMI"

* tag 'regmap-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
  regmap: mmio: Explicitly say little endian is the defualt in the bus config
  regmap: mmio: Parse endianness definitions from DT
  regmap: Fix implicit inclusion of device.h
  regmap: mmio: Fix value endianness selection
  regmap: fix documentation to match code

c42b8fe9

Merge tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 90fa7c7f

Linus Torvalds authored May 13, 2016

Pull media fix from Mauro Carvalho Chehab:
 "A revert fixing a breakage that caused an OOPS on all VB2-based DVB
  drivers.

  We already have a proper fix, but it sounds safer to keep it being
  tested for a while and not hurry, to avoid the risk of another
  regression, specially since this is meant to be c/c to stable.  So,
  for now, let's just revert the broken patch"

* tag 'media/v4.6-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"

90fa7c7f

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 9dcf8a58

Linus Torvalds authored May 13, 2016

Pull drm fixes from Dave Airlie:
 "A bunch of radeon displayport mode setting fixes, and some misc i915
  fixes.

  There is one revert, the MST audio code in i915 was causing some
  oopses, so we've decided just to drop it until next kernel when we can
  fix it properly"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/amdgpu: fix DP mode validation
  drm/radeon: fix DP mode validation
  drm/i915: Bail out of pipe config compute loop on LPT
  drm/radeon: fix PLL sharing on DCE6.1 (v2)
  drm/radeon: fix DP link training issue with second 4K monitor
  Revert "drm/i915: start adding dp mst audio"
  drm/i915/bdw: Add missing delay during L3 SQC credit programming
  drm/i915/lvds: separate border enable readout from panel fitter
  drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency

9dcf8a58

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · dc0f2f87

Linus Torvalds authored May 13, 2016

Pull crypto fix from Herbert Xu:
 "This fixes a bug in the RSA self-test that may cause crashes on some
  architectures such as SPARC"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: testmgr - Use kmalloc memory for RSA input

dc0f2f87

Merge remote-tracking branches 'regulator/fix/axp20x', 'regulator/fix/da9063',... · 9689dab3

Mark Brown authored May 13, 2016

Merge remote-tracking branches 'regulator/fix/axp20x', 'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus

9689dab3

Merge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and... · 2a2cd521
Mark Brown authored May 13, 2016
```
Merge remote-tracking branches 'regmap/fix/be', 'regmap/fix/doc' and 'regmap/fix/spmi' into regmap-linus
```
2a2cd521
Merge remote-tracking branch 'regmap/fix/mmio' into regmap-linus · 066a0e0b
Mark Brown authored May 13, 2016

066a0e0b

Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · e02aacb6

Dave Airlie authored May 13, 2016

DP mode validation regression fix.
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: fix DP mode validation
  drm/radeon: fix DP mode validation

e02aacb6

xen-netback: fix extra_info handling in xenvif_tx_err() · 72eec92a

Paul Durrant authored May 12, 2016

Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72eec92a

Merge tag 'perf-urgent-for-mingo-20160512' of... · 636fa4a7

Ingo Molnar authored May 13, 2016

Merge tag 'perf-urgent-for-mingo-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fallback to usermode-only counters when perf_event_paranoid > 1, which
  is the case now (Arnaldo Carvalho de Melo)

- Do not reassign parg after collapse_tree() in libtraceevent, which
  may cause tool crashes (Steven Rostedt)

- Fix the build on Fedora Rawhide, where readdir_r() is deprecated and
  also wrt -Werror=unused-const-variable= + x86_32_regoffset_table on
  !x86_64 (Arnaldo Carvalho de Melo)

- Fix the build on Ubuntu 12.04.5, where dwarf_getlocations() isn't
  available, i.e. libdw-dev < 0.157 (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

636fa4a7

Merge branch 'akpm' (patches from Andrew) · a2ccb68b

Linus Torvalds authored May 12, 2016

Merge fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: thp: calculate the mapcount correctly for THP pages during WP faults
  ksm: fix conflict between mmput and scan_get_next_rmap_item
  ocfs2: fix posix_acl_create deadlock
  ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang

a2ccb68b

12 May, 2016 2 commits

mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed

Andrea Arcangeli authored May 12, 2016

This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Mike Marciniszyn said:

: Our RDMA tests are seeing an issue with memory locking that bisects to
: commit 61f5d698 ("mm: re-enable THP")
:
: The test program registers two rather large MRs (512M) and RDMA
: writes data to a passive peer using the first and RDMA reads it back
: into the second MR and compares that data.  The sizes are chosen randomly
: between 0 and 1024 bytes.
:
: The test will get through a few (<= 4 iterations) and then gets a
: compare error.
:
: Tracing indicates the kernel logical addresses associated with the individual
: pages at registration ARE correct , the data in the "RDMA read response only"
: packets ARE correct.
:
: The "corruption" occurs when the packet crosse two pages that are not physically
: contiguous.   The second page reads back as zero in the program.
:
: It looks like the user VA at the point of the compare error no longer points to
: the same physical address as was registered.
:
: This patch totally resolves the issue!

Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>
Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
Cc: <stable@vger.kernel.org>	[4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6d0a07ed

ksm: fix conflict between mmput and scan_get_next_rmap_item · 7496fea9

Zhou Chengming authored May 12, 2016

A concurrency issue about KSM in the function scan_get_next_rmap_item.

task A (ksmd):				|task B (the mm's task):
					|
mm = slot->mm;				|
down_read(&mm->mmap_sem);		|
					|
...					|
					|
spin_lock(&ksm_mmlist_lock);		|
					|
ksm_scan.mm_slot go to the next slot;	|
					|
spin_unlock(&ksm_mmlist_lock);		|
					|mmput() ->
					|	ksm_exit():
					|
					|spin_lock(&ksm_mmlist_lock);
					|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
					|	if (!mm_slot->rmap_list) {
					|		easy_to_free = 1;
					|		...
					|
					|if (easy_to_free) {
					|	mmdrop(mm);
					|	...
					|
					|So this mm_struct may be freed in the mmput().
					|
up_read(&mm->mmap_sem);			|

As we can see above, the ksmd thread may access a mm_struct that already
been freed to the kmem_cache.  Suppose a fork will get this mm_struct from
the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
cause mmap_sem.count to become -1.

As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
the same SMP race condition, so fix it too.  My prev fix in function
scan_get_next_rmap_item will introduce a different SMP race condition, so
just invert the up_read/spin_unlock order as Andrea Arcangeli said.

Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7496fea9