1. 30 Aug, 2017 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.12.10 · 6371f030
      Greg Kroah-Hartman authored
      6371f030
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Ensure cpumask update is ordered · 849e9675
      Benjamin Herrenschmidt authored
      commit 1a92a80a upstream.
      
      There is no guarantee that the various isync's involved with
      the context switch will order the update of the CPU mask with
      the first TLB entry for the new context being loaded by the HW.
      
      Be safe here and add a memory barrier to order any subsequent
      load/store which may bring entries into the TLB.
      
      The corresponding barrier on the other side already exists as
      pte updates use pte_xchg() which uses __cmpxchg_u64 which has
      a sync after the atomic operation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Add comments in the code]
      [mpe: Backport to 4.12, minor context change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      849e9675
    • Lv Zheng's avatar
      ACPI: EC: Fix regression related to wrong ECDT initialization order · 53220a20
      Lv Zheng authored
      commit 98529b92 upstream.
      
      Commit 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle
      EC events) introduced acpi_ec_ecdt_start(), but that function is
      invoked before acpi_ec_query_init(), which is too early.  This causes
      the kernel to crash if an EC event occurs after boot, when ec_query_wq
      is not valid:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
       ...
       Workqueue: events acpi_ec_event_handler
       task: ffff9f539790dac0 task.stack: ffffb437c0e10000
       RIP: 0010:__queue_work+0x32/0x430
      
      Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
      is actually a no-op in the majority of cases.  However, commit
      c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      caused the probing of the DSDT EC as the "boot EC" to be skipped when
      the ECDT EC is valid and uncovered the bug.
      
      Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
      in acpi_ec_init().
      
      Link: https://jira01.devtools.intel.com/browse/LCK-4348
      Fixes: 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
      Fixes: c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      Reported-by: default avatarWang Wendy <wendy.wang@intel.com>
      Tested-by: default avatarFeng Chenzhou <chenzhoux.feng@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53220a20
    • Hanjun Guo's avatar
      ACPI: APD: Fix HID for Hisilicon Hip07/08 · 6e80b88a
      Hanjun Guo authored
      commit f7f3dd5b upstream.
      
      ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2,
      not HISI0A21/2, HISI02A1/2 was tested ok but was modified
      by the stupid typo when upstream the patches (by me),
      correct them to the right IDs (matching the IDs in
      drivers/i2c/busses/i2c-designware-platdrv.c).
      
      Fixes: 6e14cf36 (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller)
      Reported-by: default avatarTao Tian <tiantao6@huawei.com>
      Signed-off-by: default avatarHanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e80b88a
    • Dave Jiang's avatar
      ntb: transport shouldn't disable link due to bogus values in SPADs · 49fa8c02
      Dave Jiang authored
      commit f3fd2afe upstream.
      
      It seems that under certain scenarios the SPAD can have bogus values caused
      by an agent (i.e. BIOS or other software) that is not the kernel driver, and
      that causes memory window setup failure. This should not cause the link to
      be disabled because if we do that, the driver will never recover again. We
      have verified in testing that this issue happens and prevents proper link
      recovery.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: 84f76685 ("ntb: stop link work when we do not have memory")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49fa8c02
    • Logan Gunthorpe's avatar
      ntb: ntb_test: ensure the link is up before trying to configure the mws · ab75f027
      Logan Gunthorpe authored
      commit 0eb46345 upstream.
      
      After the link tests, there is a race on one side of the test for
      the link coming up. It's possible, in some cases, for the test script
      to write to the 'peer_trans' files before the link has come up.
      
      To fix this, we simply use the link event file to ensure both sides
      see the link as up before continuning.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab75f027
    • Linus Torvalds's avatar
      Clarify (and fix) MAX_LFS_FILESIZE macros · 03e58884
      Linus Torvalds authored
      commit 0cc3b0ec upstream.
      
      We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
      filesystems (and other IO targets) that know they are 64-bit clean and
      don't have any 32-bit limits in their IO path.
      
      It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
      the VM layer is limited by the page cache to only 32-bit index values,
      but our logic for that was confusing and actually wrong.  We used to
      define that value to
      
      	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
      
      which is actually odd in several ways: it limits the index to 31 bits,
      and then it limits files so that they can't have data in that last byte
      of a page that has the highest 31-bit index (ie page index 0x7fffffff).
      
      Neither of those limitations make sense.  The index is actually the full
      32 bit unsigned value, and we can use that whole full page.  So the
      maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
      
      However, we do wan tto avoid the maximum index, because we have code
      that iterates over the page indexes, and we don't want that code to
      overflow.  So the maximum size of a file on a 32-bit host should
      actually be one page less than the full 32-bit index.
      
      So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
      not actually be using the page of that last index (ULONG_MAX), but we
      can grow a file up to that limit.
      
      The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
      Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
      volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
      byte less), but the actual true VM limit is one page less than 16TiB.
      
      This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
      in truncate_inode_pages_range()"), which started applying that
      MAX_LFS_FILESIZE limit to block devices too.
      
      NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
      actually just the offset type itself (loff_t), which is signed.  But for
      clarity, on 64-bit, just use the maximum signed value, and don't make
      people have to count the number of 'f' characters in the hex constant.
      
      So just use LLONG_MAX for the 64-bit case.  That was what the value had
      been before too, just written out as a hex constant.
      
      Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      Reported-and-tested-by: default avatarDoug Nazar <nazard@nazar.ca>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03e58884
    • Joerg Roedel's avatar
      iommu: Fix wrong freeing of iommu_device->dev · 0b9a3f30
      Joerg Roedel authored
      commit 2926a2aa upstream.
      
      The struct iommu_device has a 'struct device' embedded into
      it, not as a pointer, but the whole struct. In the
      conversion of the iommu drivers to use struct iommu_device
      it was forgotten that the relase function for that struct
      device simply calls kfree() on the pointer.
      
      This frees memory that was never allocated and causes memory
      corruption.
      
      To fix this issue, use a pointer to struct device instead of
      embedding the whole struct. This needs some updates in the
      iommu sysfs code as well as the Intel VT-d and AMD IOMMU
      driver.
      Reported-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Fixes: 39ab9555 ('iommu: Add sysfs bindings for struct iommu_device')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b9a3f30
    • Charles Milette's avatar
      staging: rtl8188eu: add RNX-N150NUB support · 75005bf8
      Charles Milette authored
      commit f299aec6 upstream.
      
      Add support for USB Device Rosewill RNX-N150NUB.
      VendorID: 0x0bda, ProductID: 0xffef
      Signed-off-by: default avatarCharles Milette <charles.milette@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75005bf8
    • Lorenzo Bianconi's avatar
      iio: magnetometer: st_magn: remove ihl property for LSM303AGR · 91628e2a
      Lorenzo Bianconi authored
      commit 8b35a5f8 upstream.
      
      Remove IRQ active low support for LSM303AGR since the sensor does not
      support that capability for data-ready line
      
      Fixes: a9fd053b (iio: st_sensors: support active-low interrupts)
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@st.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91628e2a
    • Lorenzo Bianconi's avatar
      iio: magnetometer: st_magn: fix status register address for LSM303AGR · e59c095c
      Lorenzo Bianconi authored
      commit 541ee9b2 upstream.
      
      Fixes: 97865fe4 (iio: st_sensors: verify interrupt event to status)
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@st.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e59c095c
    • Srinivas Pandruvada's avatar
      iio: hid-sensor-trigger: Fix the race with user space powering up sensors · fc7957b6
      Srinivas Pandruvada authored
      commit f1664eaa upstream.
      
      It has been reported for a while that with iio-sensor-proxy service the
      rotation only works after one suspend/resume cycle. This required a wait
      in the systemd unit file to avoid race. I found a Yoga 900 where I could
      reproduce this.
      
      The problem scenerio is:
      - During sensor driver init, enable run time PM and also set a
        auto-suspend for 3 seconds.
      	This result in one runtime resume. But there is a check to avoid
      a powerup in this sequence, but rpm is active
      - User space iio-sensor-proxy tries to power up the sensor. Since rpm is
        active it will simply return. But sensors were not actually
      powered up in the prior sequence, so actaully the sensors will not work
      - After 3 seconds the auto suspend kicks
      
      If we add a wait in systemd service file to fire iio-sensor-proxy after
      3 seconds, then now everything will work as the runtime resume will
      actually powerup the sensor as this is a user request.
      
      To avoid this:
      - Remove the check to match user requested state, this will cause a
        brief powerup, but if the iio-sensor-proxy starts immediately it will
      still work as the sensors are ON.
      - Also move the autosuspend delay to place when user requested turn off
        of sensors, like after user finished raw read or buffer disable
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: default avatarBastien Nocera <hadess@hadess.net>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc7957b6
    • Dragos Bogdan's avatar
      iio: imu: adis16480: Fix acceleration scale factor for adis16480 · a1d7b7e7
      Dragos Bogdan authored
      commit fdd0d32e upstream.
      
      According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
      so the scale factor should be 10 instead of 5.
      Signed-off-by: default avatarDragos Bogdan <dragos.bogdan@analog.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1d7b7e7
    • Martijn Coenen's avatar
      ANDROID: binder: fix proc->tsk check. · bf9b9d3b
      Martijn Coenen authored
      commit b2a6d1b9 upstream.
      
      Commit c4ea41ba ("binder: use group leader instead of open thread")'
      was incomplete and didn't update a check in binder_mmap(), causing all
      mmap() calls into the binder driver to fail.
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf9b9d3b
    • Riley Andrews's avatar
      binder: Use wake up hint for synchronous transactions. · f6fc60d9
      Riley Andrews authored
      commit 00b40d61 upstream.
      
      Use wake_up_interruptible_sync() to hint to the scheduler binder
      transactions are synchronous wakeups. Disable preemption while waking
      to avoid ping-ponging on the binder lock.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarOmprakash Dhyade <odhyade@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6fc60d9
    • Todd Kjos's avatar
      binder: use group leader instead of open thread · 7771e3f4
      Todd Kjos authored
      commit c4ea41ba upstream.
      
      The binder allocator assumes that the thread that
      called binder_open will never die for the lifetime of
      that proc. That thread is normally the group_leader,
      however it may not be. Use the group_leader instead
      of current.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7771e3f4
    • Todd Kjos's avatar
      Revert "android: binder: Sanity check at binder ioctl" · 62ccb816
      Todd Kjos authored
      commit a2b18708 upstream.
      
      This reverts commit a906d693.
      
      The patch introduced a race in the binder driver. An attempt to fix the
      race was submitted in "[PATCH v2] android: binder: fix dangling pointer
      comparison", however the conclusion in the discussion for that patch
      was that the original patch should be reverted.
      
      The reversion is being done as part of the fine-grained locking
      patchset since the patch would need to be refactored when
      proc->vmm_vm_mm is removed from struct binder_proc and added
      in the binder allocator.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62ccb816
    • Jeffy Chen's avatar
      Bluetooth: bnep: fix possible might sleep error in bnep_session · b42c44ad
      Jeffy Chen authored
      commit 25717382 upstream.
      
      It looks like bnep_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b42c44ad
    • Jeffy Chen's avatar
      Bluetooth: cmtp: fix possible might sleep error in cmtp_session · b7418962
      Jeffy Chen authored
      commit f06d9773 upstream.
      
      It looks like cmtp_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7418962
    • Jeffy Chen's avatar
      Bluetooth: hidp: fix possible might sleep error in hidp_session_thread · e792d2d4
      Jeffy Chen authored
      commit 5da8e47d upstream.
      
      It looks like hidp_session_thread has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Tested-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Tested-by: default avatarRohit Vaswani <rvaswani@nvidia.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e792d2d4
    • Mateusz Jurczyk's avatar
      netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv · 1eb33a1b
      Mateusz Jurczyk authored
      commit f55ce7b0 upstream.
      
      Verify that the length of the socket buffer is sufficient to cover the
      nlmsghdr structure before accessing the nlh->nlmsg_len field for further
      input sanitization. If the client only supplies 1-3 bytes of data in
      sk_buff, then nlh->nlmsg_len remains partially uninitialized and
      contains leftover memory from the corresponding kernel allocation.
      Operating on such data may result in indeterminate evaluation of the
      nlmsg_len < NLMSG_HDRLEN expression.
      
      The bug was discovered by a runtime instrumentation designed to detect
      use of uninitialized memory in the kernel. The patch prevents this and
      other similar tools (e.g. KMSAN) from flagging this behavior in the future.
      Signed-off-by: default avatarMateusz Jurczyk <mjurczyk@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1eb33a1b
    • Florian Westphal's avatar
      netfilter: nat: fix src map lookup · 8b504107
      Florian Westphal authored
      commit 97772bcd upstream.
      
      When doing initial conversion to rhashtable I replaced the bucket
      walk with a single rhashtable_lookup_fast().
      
      When moving to rhlist I failed to properly walk the list of identical
      tuples, but that is what is needed for this to work correctly.
      The table contains the original tuples, so the reply tuples are all
      distinct.
      
      We currently decide that mapping is (not) in range only based on the
      first entry, but in case its not we need to try the reply tuple of the
      next entry until we either find an in-range mapping or we checked
      all the entries.
      
      This bug makes nat core attempt collision resolution while it might be
      able to use the mapping as-is.
      
      Fixes: 870190a9 ("netfilter: nat: convert nat bysrc hash to rhashtable")
      Reported-by: default avatarJaco Kroon <jaco@uls.co.za>
      Tested-by: default avatarJaco Kroon <jaco@uls.co.za>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b504107
    • Florian Westphal's avatar
      netfilter: expect: fix crash when putting uninited expectation · f5263887
      Florian Westphal authored
      commit 36ac344e upstream.
      
      We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the
      uninitialised expectation instead of existing one, so del_timer chokes
      on random memory address.
      
      Fixes: ec0e3f01 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()")
      Reported-by: default avatarSergey Kvachonok <ravenexp@gmail.com>
      Tested-by: default avatarSergey Kvachonok <ravenexp@gmail.com>
      Cc: Gao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5263887
    • Vadim Lomovtsev's avatar
      net: sunrpc: svcsock: fix NULL-pointer exception · 4909a7b7
      Vadim Lomovtsev authored
      commit eebe53e8 upstream.
      
      While running nfs/connectathon tests kernel NULL-pointer exception
      has been observed due to races in svcsock.c.
      
      Race is appear when kernel accepts connection by kernel_accept
      (which creates new socket) and start queuing ingress packets
      to new socket. This happens in ksoftirq context which could run
      concurrently on a different core while new socket setup is not done yet.
      
      The fix is to re-order socket user data init sequence and add
      write/read barrier calls to be sure that we got proper values
      for callback pointers before actually calling them.
      
      Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
      
      Crash log:
      ---<-snip->---
      [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [ 6708.647093] pgd = ffff0000094e0000
      [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
      [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
      [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
      [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
      [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
      [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
      [ 6708.773822] PC is at 0x0
      [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
      [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
      [ 6708.789248] sp : ffff810ffbad3900
      [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
      [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
      [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
      [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
      [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
      [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
      [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
      [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
      [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
      [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
      [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
      [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
      [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
      [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
      [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
      [ 6708.872084]
      [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
      [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
      [ 6708.886075] Call trace:
      [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
      [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
      [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
      [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
      [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
      [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
      [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
      [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
      [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
      [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
      [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
      [ 6708.973117] [<          (null)>]           (null)
      [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
      [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
      [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
      [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
      [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
      [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
      [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
      [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
      [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
      [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
      [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
      [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
      [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
      [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
      [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
      [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
      [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
      [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
      [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
      [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
      [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
      [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
      [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
      [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
      [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
      [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
      [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
      [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
      [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
      [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
      [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
      [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
      [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
      [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
      [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
      [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
      [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
      [ 6709.207967] Code: bad PC value
      [ 6709.211061] SMP: stopping secondary CPUs
      [ 6709.218830] Starting crashdump kernel...
      [ 6709.222749] Bye!
      ---<-snip>---
      Signed-off-by: default avatarVadim Lomovtsev <vlomovts@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4909a7b7
    • Eric Biggers's avatar
      x86/mm: Fix use-after-free of ldt_struct · a8da876c
      Eric Biggers authored
      commit ccd5b323 upstream.
      
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8da876c
    • Nicholas Piggin's avatar
      timers: Fix excessive granularity of new timers after a nohz idle · 2e11eede
      Nicholas Piggin authored
      commit 2fe59f50 upstream.
      
      When a timer base is idle, it is forwarded when a new timer is added
      to ensure that granularity does not become excessive. When not idle,
      the timer tick is expected to increment the base.
      
      However there are several problems:
      
      - If an existing timer is modified, the base is forwarded only after
        the index is calculated.
      
      - The base is not forwarded by add_timer_on.
      
      - There is a window after a timer is restarted from a nohz idle, after
        it is marked not-idle and before the timer tick on this CPU, where a
        timer may be added but the ancient base does not get forwarded.
      
      These result in excessive granularity (a 1 jiffy timeout can blow out
      to 100s of jiffies), which cause the rcu lockup detector to trigger,
      among other things.
      
      Fix this by keeping track of whether the timer base has been idle
      since it was last run or forwarded, and if so then forward it before
      adding a new timer.
      
      There is still a case where mod_timer optimises the case of a pending
      timer mod with the same expiry time, where the timer can see excessive
      granularity relative to the new, shorter interval. A comment is added,
      but it's not changed because it is an important fastpath for
      networking.
      
      This has been tested and found to fix the RCU softlockup messages.
      
      Testing was also done with tracing to measure requested versus
      achieved wakeup latencies for all non-deferrable timers in an idle
      system (with no lockup watchdogs running). Wakeup latency relative to
      absolute latency is calculated (note this suffers from round-up skew
      at low absolute times) and analysed:
      
                   max     avg      std
      upstream   506.0    1.20     4.68
      patched      2.0    1.08     0.15
      
      The bug was noticed due to the lockup detector Kconfig changes
      dropping it out of people's .configs and resulting in larger base
      clk skew When the lockup detectors are enabled, no CPU can go idle for
      longer than 4 seconds, which limits the granularity errors.
      Sub-optimal timer behaviour is observable on a smaller scale in that
      case:
      
      	     max     avg      std
      upstream     9.0    1.05     0.19
      patched      2.0    1.04     0.11
      
      Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: dzickus@redhat.com
      Cc: sfr@canb.auug.org.au
      Cc: mpe@ellerman.id.au
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: linuxarm@huawei.com
      Cc: abdhalee@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: akpm@linux-foundation.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e11eede
    • Mark Rutland's avatar
      perf/core: Fix group {cpu,task} validation · 2c0dc7f0
      Mark Rutland authored
      commit 64aee2a9 upstream.
      
      Regardless of which events form a group, it does not make sense for the
      events to target different tasks and/or CPUs, as this leaves the group
      inconsistent and impossible to schedule. The core perf code assumes that
      these are consistent across (successfully intialised) groups.
      
      Core perf code only verifies this when moving SW events into a HW
      context. Thus, we can violate this requirement for pure SW groups and
      pure HW groups, unless the relevant PMU driver happens to perform this
      verification itself. These mismatched groups subsequently wreak havoc
      elsewhere.
      
      For example, we handle watchpoints as SW events, and reserve watchpoint
      HW on a per-CPU basis at pmu::event_init() time to ensure that any event
      that is initialised is guaranteed to have a slot at pmu::add() time.
      However, the core code only checks the group leader's cpu filter (via
      event_filter_match()), and can thus install follower events onto CPUs
      violating thier (mismatched) CPU filters, potentially installing them
      into a CPU without sufficient reserved slots.
      
      This can be triggered with the below test case, resulting in warnings
      from arch backends.
      
        #define _GNU_SOURCE
        #include <linux/hw_breakpoint.h>
        #include <linux/perf_event.h>
        #include <sched.h>
        #include <stdio.h>
        #include <sys/prctl.h>
        #include <sys/syscall.h>
        #include <unistd.h>
      
        static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
      			   int group_fd, unsigned long flags)
        {
      	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
        }
      
        char watched_char;
      
        struct perf_event_attr wp_attr = {
      	.type = PERF_TYPE_BREAKPOINT,
      	.bp_type = HW_BREAKPOINT_RW,
      	.bp_addr = (unsigned long)&watched_char,
      	.bp_len = 1,
      	.size = sizeof(wp_attr),
        };
      
        int main(int argc, char *argv[])
        {
      	int leader, ret;
      	cpu_set_t cpus;
      
      	/*
      	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
      	 */
      	CPU_ZERO(&cpus);
      	CPU_SET(0, &cpus);
      	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
      	if (ret) {
      		printf("Unable to set cpu affinity\n");
      		return 1;
      	}
      
      	/* open leader event, bound to this task, CPU0 only */
      	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	if (leader < 0) {
      		printf("Couldn't open leader: %d\n", leader);
      		return 1;
      	}
      
      	/*
      	 * Open a follower event that is bound to the same task, but a
      	 * different CPU. This means that the group should never be possible to
      	 * schedule.
      	 */
      	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
      	if (ret < 0) {
      		printf("Couldn't open mismatched follower: %d\n", ret);
      		return 1;
      	} else {
      		printf("Opened leader/follower with mismastched CPUs\n");
      	}
      
      	/*
      	 * Open as many independent events as we can, all bound to the same
      	 * task, CPU0 only.
      	 */
      	do {
      		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	} while (ret >= 0);
      
      	/*
      	 * Force enable/disble all events to trigger the erronoeous
      	 * installation of the follower event.
      	 */
      	printf("Opened all events. Toggling..\n");
      	for (;;) {
      		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
      		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
      	}
      
      	return 0;
        }
      
      Fix this by validating this requirement regardless of whether we're
      moving events.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhou Chengming <zhouchengming1@huawei.com>
      Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c0dc7f0
    • Steven Rostedt (VMware)'s avatar
      ftrace: Check for null ret_stack on profile function graph entry function · aa2da6c4
      Steven Rostedt (VMware) authored
      commit a8f0f9e4 upstream.
      
      There's a small race when function graph shutsdown and the calling of the
      registered function graph entry callback. The callback must not reference
      the task's ret_stack without first checking that it is not NULL. Note, when
      a ret_stack is allocated for a task, it stays allocated until the task exits.
      The problem here, is that function_graph is shutdown, and a new task was
      created, which doesn't have its ret_stack allocated. But since some of the
      functions are still being traced, the callbacks can still be called.
      
      The normal function_graph code handles this, but starting with commit
      8861dd30 ("ftrace: Access ret_stack->subtime only in the function
      profiler") the profiler code references the ret_stack on function entry, but
      doesn't check if it is NULL first.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611
      
      Fixes: 8861dd30 ("ftrace: Access ret_stack->subtime only in the function profiler")
      Reported-by: lilydjwg@gmail.com
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa2da6c4
    • Christoph Hellwig's avatar
      virtio_pci: fix cpu affinity support · 1b8ca885
      Christoph Hellwig authored
      commit ba74b6f7 upstream.
      
      Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
      virtqueues"") removed the adjustment of the pre_vectors for the virtio
      MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
      allow drivers to request IRQ affinity when creating VQs"). This will
      lead to an incorrect assignment of MSI-X vectors, and potential
      deadlocks when offlining cpus.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
      Reported-by: default avatarYASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b8ca885
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU · 78f2e29f
      Steven Rostedt (VMware) authored
      commit a7e52ad7 upstream.
      
      Chunyu Hu reported:
        "per_cpu trace directories and files are created for all possible cpus,
         but only the cpus which have ever been on-lined have their own per cpu
         ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
         open handler for trace file 'trace_pipe_raw' is always trying to access
         field of ring_buffer_per_cpu, and would panic with the NULL pointer.
      
         Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
         when openning it if that cpu does not have trace ring buffer.
      
         Reproduce:
         cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
         (cpu31 is never on-lined, this is a 16 cores x86_64 box)
      
         Tested with:
         1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
            Got -NODEV.
         2) oneline cpu15, read trace_pipe_raw of cpu15.
            Get the raw trace data.
      
         Call trace:
         [ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
         [ 5760.961678]  tracing_buffers_read+0x1f6/0x230
         [ 5760.962695]  __vfs_read+0x37/0x160
         [ 5760.963498]  ? __vfs_read+0x5/0x160
         [ 5760.964339]  ? security_file_permission+0x9d/0xc0
         [ 5760.965451]  ? __vfs_read+0x5/0x160
         [ 5760.966280]  vfs_read+0x8c/0x130
         [ 5760.967070]  SyS_read+0x55/0xc0
         [ 5760.967779]  do_syscall_64+0x67/0x150
         [ 5760.968687]  entry_SYSCALL64_slow_path+0x25/0x25"
      
      This was introduced by the addition of the feature to reuse reader pages
      instead of re-allocating them. The problem is that the allocation of a
      reader page (which is per cpu) does not check if the cpu is online and set
      up for the ring buffer.
      
      Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com
      
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Reported-by: default avatarChunyu Hu <chuhu@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78f2e29f
    • Chuck Lever's avatar
      nfsd: Limit end of page list when decoding NFSv4 WRITE · 8d4f126c
      Chuck Lever authored
      commit fc788f64 upstream.
      
      When processing an NFSv4 WRITE operation, argp->end should never
      point past the end of the data in the final page of the page list.
      Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
      
      More critical, nfsd4_decode_write is failing to increment argp->pagelen
      when it increments argp->pagelist.  This can cause later xdr decoders
      to assume more data is available than really is, which can cause server
      crashes on malformed requests.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d4f126c
    • Ronnie Sahlberg's avatar
      cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() · ea5745a5
      Ronnie Sahlberg authored
      commit d3edede2 upstream.
      
      Add checking for the path component length and verify it is <= the maximum
      that the server advertizes via FileFsAttributeInformation.
      
      With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
      when users to access an overlong path.
      
      To test this, try to cd into a (non-existing) directory on a CIFS share
      that has a too long name:
      cd /mnt/aaaaaaaaaaaaaaa...
      
      and it now should show a good error message from the shell:
      bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long
      
      rh bz 1153996
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea5745a5
    • Sachin Prabhu's avatar
      cifs: Fix df output for users with quota limits · 1bc1c439
      Sachin Prabhu authored
      commit 42bec214 upstream.
      
      The df for a SMB2 share triggers a GetInfo call for
      FS_FULL_SIZE_INFORMATION. The values returned are used to populate
      struct statfs.
      
      The problem is that none of the information returned by the call
      contains the total blocks available on the filesystem. Instead we use
      the blocks available to the user ie. quota limitation when filling out
      statfs.f_blocks. The information returned does contain Actual free units
      on the filesystem and is used to populate statfs.f_bfree. For users with
      quota enabled, it can lead to situations where the total free space
      reported is more than the total blocks on the system ending up with df
      reports like the following
      
       # df -h /mnt/a
      Filesystem         Size  Used Avail Use% Mounted on
      //192.168.22.10/a  2.5G -2.3G  2.5G    - /mnt/a
      
      To fix this problem, we instead populate both statfs.f_bfree with the
      same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This
      is similar to what is done already in the code for cifs and df now
      reports the quota information for the user used to mount the share.
      
       # df --si /mnt/a
      Filesystem         Size  Used Avail Use% Mounted on
      //192.168.22.10/a  2.7G  101M  2.6G   4% /mnt/a
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarPierguido Lambri <plambri@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bc1c439
    • Nicholas Piggin's avatar
      kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured · 3b278d7e
      Nicholas Piggin authored
      commit cb87481e upstream.
      
      The .data and .bss sections were modified in the generic linker script to
      pull in sections named .data.<C identifier>, which are generated by gcc with
      -ffunction-sections and -fdata-sections options.
      
      The problem with this pattern is it can also match section names that Linux
      defines explicitly, e.g., .data.unlikely. This can cause Linux sections to
      get moved into the wrong place.
      
      The way to avoid this is to use ".." separators for explicit section names
      (the dot character is valid in a section name but not a C identifier).
      However currently there are sections which don't follow this rule, so for
      now just disable the wild card by default.
      
      Example: http://marc.info/?l=linux-arm-kernel&m=150106824024221&w=2
      
      Fixes: b67067f1 ("kbuild: allow archs to select link dead code/data elimination")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b278d7e
    • Bharat Potnuri's avatar
      RDMA/uverbs: Initialize cq_context appropriately · 51f49383
      Bharat Potnuri authored
      commit 65159c05 upstream.
      
      Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
      dereference in ib_uverbs_comp_handler(), if application doesnot use completion
      channel. This patch fixes the cq_context initialization.
      
      Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked")
      Signed-off-by: default avatarPotnuri Bharat Teja <bharat@chelsio.com>
      Reviewed-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      (cherry picked from commit 699a2d5b)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51f49383
    • Steven Rostedt (VMware)'s avatar
      tracing: Fix freeing of filter in create_filter() when set_str is false · 53a38dfb
      Steven Rostedt (VMware) authored
      commit 8b0db1a5 upstream.
      
      Performing the following task with kmemleak enabled:
      
       # cd /sys/kernel/tracing/events/irq/irq_handler_entry/
       # echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
       # echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
       # echo scan > /sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
      unreferenced object 0xffff8800b9290308 (size 32):
        comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
          [<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
          [<ffffffff812639c9>] create_filter+0xa9/0x160
          [<ffffffff81263bdc>] create_event_filter+0xc/0x10
          [<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
          [<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
          [<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
          [<ffffffff8138cf87>] __vfs_write+0xd7/0x380
          [<ffffffff8138f421>] vfs_write+0x101/0x260
          [<ffffffff8139187b>] SyS_write+0xab/0x130
          [<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The function create_filter() is passed a 'filterp' pointer that gets
      allocated, and if "set_str" is true, it is up to the caller to free it, even
      on error. The problem is that the pointer is not freed by create_filter()
      when set_str is false. This is a bug, and it is not up to the caller to free
      the filter on error if it doesn't care about the string.
      
      Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com
      
      Fixes: 38b78eb8 ("tracing: Factorize filter creation")
      Reported-by: default avatarChunyu Hu <chuhu@redhat.com>
      Tested-by: default avatarChunyu Hu <chuhu@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53a38dfb
    • Chunyu Hu's avatar
      tracing: Fix kmemleak in tracing_map_array_free() · 983ba814
      Chunyu Hu authored
      commit 475bb3c6 upstream.
      
      kmemleak reported the below leak when I was doing clear of the hist
      trigger. With this patch, the kmeamleak is gone.
      
      unreferenced object 0xffff94322b63d760 (size 32):
        comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
        hex dump (first 32 bytes):
          00 01 00 00 04 00 00 00 08 00 00 00 ff 00 00 00  ................
          10 00 00 00 00 00 00 00 80 a8 7a f2 31 94 ff ff  ..........z.1...
        backtrace:
          [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff9e424cba>] kmem_cache_alloc_trace+0xca/0x1d0
          [<ffffffff9e377736>] tracing_map_array_alloc+0x26/0x140
          [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
          [<ffffffff9e38b935>] create_hist_data+0x535/0x750
          [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
          [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
          [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
          [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
          [<ffffffff9e450b85>] SyS_write+0x55/0xc0
          [<ffffffff9e203857>] do_syscall_64+0x67/0x150
          [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
          [<ffffffffffffffff>] 0xffffffffffffffff
      unreferenced object 0xffff9431f27aa880 (size 128):
        comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
        hex dump (first 32 bytes):
          00 00 8c 2a 32 94 ff ff 00 f0 8b 2a 32 94 ff ff  ...*2......*2...
          00 e0 8b 2a 32 94 ff ff 00 d0 8b 2a 32 94 ff ff  ...*2......*2...
        backtrace:
          [<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff9e425348>] __kmalloc+0xe8/0x220
          [<ffffffff9e3777c1>] tracing_map_array_alloc+0xb1/0x140
          [<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
          [<ffffffff9e38b935>] create_hist_data+0x535/0x750
          [<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
          [<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
          [<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
          [<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
          [<ffffffff9e450b85>] SyS_write+0x55/0xc0
          [<ffffffff9e203857>] do_syscall_64+0x67/0x150
          [<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Link: http://lkml.kernel.org/r/1502705898-27571-1-git-send-email-chuhu@redhat.com
      
      Fixes: 08d43a5f ("tracing: Add lock-free tracing_map")
      Signed-off-by: default avatarChunyu Hu <chuhu@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      983ba814
    • Dan Carpenter's avatar
      tracing: Missing error code in tracer_alloc_buffers() · a23e7828
      Dan Carpenter authored
      commit 147d88e0 upstream.
      
      If ring_buffer_alloc() or one of the next couple function calls fail
      then we should return -ENOMEM but the current code returns success.
      
      Link: http://lkml.kernel.org/r/20170801110201.ajdkct7vwzixahvx@mwanda
      
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Fixes: b32614c0 ('tracing/rb: Convert to hotplug state machine')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a23e7828
    • Steven Rostedt (VMware)'s avatar
      tracing: Call clear_boot_tracer() at lateinit_sync · 3888c3ae
      Steven Rostedt (VMware) authored
      commit 4bb0f0e7 upstream.
      
      The clear_boot_tracer function is used to reset the default_bootup_tracer
      string to prevent it from being accessed after boot, as it originally points
      to init data. But since clear_boot_tracer() is called via the
      init_lateinit() call, it races with the initcall for registering the hwlat
      tracer. If someone adds "ftrace=hwlat" to the kernel command line, depending
      on how the linker sets up the text, the saved command line may be cleared,
      and the hwlat tracer never is initialized.
      
      Simply have the clear_boot_tracer() be called by initcall_lateinit_sync() as
      that's for tasks to be called after lateinit.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196551
      
      Fixes: e7c15cd8 ("tracing: Added hardware latency tracer")
      Reported-by: default avatarZamir SUN <sztsian@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3888c3ae
    • Sakari Ailus's avatar
      ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value() · 1344db83
      Sakari Ailus authored
      commit b5212f57 upstream.
      
      acpi_graph_get_child_prop_value() is intended to find a child node with a
      certain property value pair. The check
      
      	if (!fwnode_property_read_u32(fwnode, prop_name, &nr))
      		continue;
      
      is faulty: fwnode_property_read_u32() returns zero on success, not on
      failure, leading to comparing values only if the searched property was not
      found.
      
      Moreover, the check is made against the parent device node instead of
      the child one as it should be.
      
      Fixes: 79389a83 (ACPI / property: Add support for remote endpoints)
      Reported-by: default avatarHyungwoo Yang <hyungwoo.yang@intel.com>
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1344db83