1. 13 Oct, 2014 40 commits
    • Anton Altaparmakov's avatar
      Fix nasty 32-bit overflow bug in buffer i/o code. · 2db0c704
      Anton Altaparmakov authored
      commit f2d5a944 upstream.
      
      On 32-bit architectures, the legacy buffer_head functions are not always
      handling the sector number with the proper 64-bit types, and will thus
      fail on 4TB+ disks.
      
      Any code that uses __getblk() (and thus bread(), breadahead(),
      sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
      block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
      in __getblk_slow() with an infinite stream of errors logged to dmesg
      like this:
      
        __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
        b_state=0x00000020, b_size=512
        device sda1 blocksize: 512
      
      Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
      top 32-bits are missing (in this case the 0x1 at the top).
      
      This is because grow_dev_page() is broken and has a 32-bit overflow due
      to shifting the page index value (a pgoff_t - which is just 32 bits on
      32-bit architectures) left-shifted as the block number.  But the top
      bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
      before the shift.
      
      This patch fixes this issue by type casting "index" to sector_t before
      doing the left shift.
      
      Note this is not a theoretical bug but has been seen in the field on a
      4TiB hard drive with logical sector size 512 bytes.
      
      This patch has been verified to fix the infinite loop problem on 3.17-rc5
      kernel using a 4TB disk image mounted using "-o loop".  Without this patch
      doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
      100% reproducibly whilst with the patch it works fine as expected.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2db0c704
    • Alex Deucher's avatar
      drm/nouveau/runpm: fix module unload · fa13e3ec
      Alex Deucher authored
      commit 53beaa01 upstream.
      
      Use the new vga_switcheroo_fini_domain_pm_ops function
      to unregister the pm ops.
      
      Based on a patch from:
      Pali Rohár <pali.rohar@gmail.com>
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fa13e3ec
    • Alex Deucher's avatar
      vgaswitcheroo: add vga_switcheroo_fini_domain_pm_ops · 5509a38b
      Alex Deucher authored
      commit 766a53d0 upstream.
      
      Drivers should call this on unload to unregister pmops.
      
      Bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5509a38b
    • Cong Wang's avatar
      perf: Fix a race condition in perf_remove_from_context() · fcf9b1a7
      Cong Wang authored
      commit 3577af70 upstream.
      
      We saw a kernel soft lockup in perf_remove_from_context(),
      it looks like the `perf` process, when exiting, could not go
      out of the retry loop. Meanwhile, the target process was forking
      a child. So either the target process should execute the smp
      function call to deactive the event (if it was running) or it should
      do a context switch which deactives the event.
      
      It seems we optimize out a context switch in perf_event_context_sched_out(),
      and what's more important, we still test an obsolete task pointer when
      retrying, so no one actually would deactive that event in this situation.
      Fix it directly by reloading the task pointer in perf_remove_from_context().
      
      This should cure the above soft lockup.
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fcf9b1a7
    • Mike Marciniszyn's avatar
      IB/qib: Correct reference counting in debugfs qp_stats · 8d9376b8
      Mike Marciniszyn authored
      commit 85cbb7c7 upstream.
      
      This particular reference count is not needed with the rcu protection,
      and the current code leaks a reference count, causing a hang in
      qib_qp_destroy().
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8d9376b8
    • Richard Larocque's avatar
      alarmtimer: Lock k_itimer during timer callback · 6b492bab
      Richard Larocque authored
      commit 474e941b upstream.
      
      Locks the k_itimer's it_lock member when handling the alarm timer's
      expiry callback.
      
      The regular posix timers defined in posix-timers.c have this lock held
      during timout processing because their callbacks are routed through
      posix_timer_fn().  The alarm timers follow a different path, so they
      ought to grab the lock somewhere else.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6b492bab
    • Richard Larocque's avatar
      alarmtimer: Do not signal SIGEV_NONE timers · e1f7a680
      Richard Larocque authored
      commit 265b81d2 upstream.
      
      Avoids sending a signal to alarm timers created with sigev_notify set to
      SIGEV_NONE by checking for that special case in the timeout callback.
      
      The regular posix timers avoid sending signals to SIGEV_NONE timers by
      not scheduling any callbacks for them in the first place.  Although it
      would be possible to do something similar for alarm timers, it's simpler
      to handle this as a special case in the timeout.
      
      Prior to this patch, the alarm timer would ignore the sigev_notify value
      and try to deliver signals to the process anyway.  Even worse, the
      sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
      specified, so the signal number could be bogus.  If sigev_signo was an
      unitialized value (as it often would be if SIGEV_NONE is used), then
      it's hard to predict which signal will be sent.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e1f7a680
    • Richard Larocque's avatar
      alarmtimer: Return relative times in timer_gettime · ddedde00
      Richard Larocque authored
      commit e86fea76 upstream.
      
      Returns the time remaining for an alarm timer, rather than the time at
      which it is scheduled to expire.  If the timer has already expired or it
      is not currently scheduled, the it_value's members are set to zero.
      
      This new behavior matches that of the other posix-timers and the POSIX
      specifications.
      
      This is a change in user-visible behavior, and may break existing
      applications.  Hopefully, few users rely on the old incorrect behavior.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      [jstultz: minor style tweak]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ddedde00
    • John David Anglin's avatar
      parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds · 4a472616
      John David Anglin authored
      commit d26a7730 upstream.
      
      In spite of what the GCC manual says, the -mfast-indirect-calls has
      never been supported in the 64-bit parisc compiler. Indirect calls have
      always been done using function descriptors irrespective of the
      -mfast-indirect-calls option.
      
      Recently, it was noticed that a function descriptor was always requested
      when the -mfast-indirect-calls option was specified. This caused
      problems when the option was used in  application code and doesn't make
      any sense because the whole point of the option is to avoid using a
      function descriptor for indirect calls.
      
      Fixing this broke 64-bit kernel builds.
      
      I will fix GCC but for now we need the attached change. This results in
      the same kernel code as before.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4a472616
    • Al Viro's avatar
      don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() · 11dbe17a
      Al Viro authored
      commit 7bd88377 upstream.
      
      return the value instead, and have path_init() do the assignment.  Broken by
      "vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
      which was Cc-stable with 2.6.38+ as destination.  This one should go where
      it went.
      
      To avoid dummy value returned in case when root is already set (it would do
      no harm, actually, since the only caller that doesn't ignore the return value
      is guaranteed to have nd->root *not* set, but it's more obvious that way),
      lift the check into callers.  And do the same to set_root(), to keep them
      in sync.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      11dbe17a
    • Wanpeng Li's avatar
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · 6b4238b5
      Wanpeng Li authored
      commit 03bd4e1f upstream.
      
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: default avatarLinn Crosetto <linn@hp.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarToshi Kani <toshi.kani@hp.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6b4238b5
    • David Rientjes's avatar
      mm, slab: initialize object alignment on cache creation · 26afc4e5
      David Rientjes authored
      commit d4a5fca5 upstream.
      
      Since commit 45906855 ("mm/sl[aou]b: Common alignment code"), the
      "ralign" automatic variable in __kmem_cache_create() may be used as
      uninitialized.
      
      The proper alignment defaults to BYTES_PER_WORD and can be overridden by
      SLAB_RED_ZONE or the alignment specified by the caller.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarAndrei Elovikov <a.elovikov@gmail.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      26afc4e5
    • Joseph Qi's avatar
      ocfs2/dlm: do not get resource spinlock if lockres is new · df59eb8e
      Joseph Qi authored
      commit 5760a97c upstream.
      
      There is a deadlock case which reported by Guozhonghua:
        https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
      
      This case is caused by &res->spinlock and &dlm->master_lock
      misordering in different threads.
      
      It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap
      helpers").  Since lockres is new, it doesn't not require the
      &res->spinlock.  So remove it.
      
      Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reported-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      df59eb8e
    • Andreas Rohner's avatar
      nilfs2: fix data loss with mmap() · 207a97e3
      Andreas Rohner authored
      commit 56d7acc7 upstream.
      
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      207a97e3
    • Andrey Vagin's avatar
      fs/notify: don't show f_handle if exportfs_encode_inode_fh failed · 90aec2f7
      Andrey Vagin authored
      commit 7e882481 upstream.
      
      Currently we handle only ENOSPC.  In case of other errors the file_handle
      variable isn't filled properly and we will show a part of stack.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      90aec2f7
    • Andrey Vagin's avatar
      fsnotify/fdinfo: use named constants instead of hardcoded values · ea854411
      Andrey Vagin authored
      commit 1fc98d11 upstream.
      
      MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
      bytes, so exportfs_encode_inode_fh can return an error.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ea854411
    • Rasmus Villemoes's avatar
      kcmp: fix standard comparison bug · 48133a70
      Rasmus Villemoes authored
      commit acbbe6fb upstream.
      
      The C operator <= defines a perfectly fine total ordering on the set of
      values representable in a long.  However, unlike its namesake in the
      integers, it is not translation invariant, meaning that we do not have
      "b <= c" iff "a+b <= a+c" for all a,b,c.
      
      This means that it is always wrong to try to boil down the relationship
      between two longs to a question about the sign of their difference,
      because the resulting relation [a LEQ b iff a-b <= 0] is neither
      anti-symmetric or transitive.  The former is due to -LONG_MIN==LONG_MIN
      (take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
      b).  The latter can either be seen observing that x LEQ x+1 for all x,
      implying x LEQ x+1 LEQ x+2 ...  LEQ x-1 LEQ x; or more directly with the
      simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
      0.
      
      Note that it makes absolutely no difference that a transmogrying bijection
      has been applied before the comparison is done.  In fact, had the
      obfuscation not been done, one could probably not observe the bug
      (assuming all values being compared always lie in one half of the address
      space, the mathematical value of a-b is always representable in a long).
      As it stands, one can easily obtain three file descriptors exhibiting the
      non-transitivity of kcmp().
      
      Side note 1: I can't see that ensuring the MSB of the multiplier is
      set serves any purpose other than obfuscating the obfuscating code.
      
      Side note 2:
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <assert.h>
      #include <sys/syscall.h>
      
      enum kcmp_type {
              KCMP_FILE,
              KCMP_VM,
              KCMP_FILES,
              KCMP_FS,
              KCMP_SIGHAND,
              KCMP_IO,
              KCMP_SYSVSEM,
              KCMP_TYPES,
      };
      pid_t pid;
      
      int kcmp(pid_t pid1, pid_t pid2, int type,
      	 unsigned long idx1, unsigned long idx2)
      {
      	return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
      }
      int cmp_fd(int fd1, int fd2)
      {
      	int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2);
      	if (c < 0) {
      		perror("kcmp");
      		exit(1);
      	}
      	assert(0 <= c && c < 3);
      	return c;
      }
      int cmp_fdp(const void *a, const void *b)
      {
      	static const int normalize[] = {0, -1, 1};
      	return normalize[cmp_fd(*(int*)a, *(int*)b)];
      }
      #define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */
      int main(int argc, char *argv[])
      {
      	int r, s, count = 0;
      	int REL[3] = {0,0,0};
      	int fd[MAX];
      	pid = getpid();
      	while (count < MAX) {
      		r = open("/dev/null", O_RDONLY);
      		if (r < 0)
      			break;
      		fd[count++] = r;
      	}
      	printf("opened %d file descriptors\n", count);
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	qsort(fd, count, sizeof(fd[0]), cmp_fdp);
      	memset(REL, 0, sizeof(REL));
      
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	return (REL[0] + REL[2] != 0);
      }
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      48133a70
    • Nicolas Iooss's avatar
      eventpoll: fix uninitialized variable in epoll_ctl · a07e826f
      Nicolas Iooss authored
      commit c680e41b upstream.
      
      When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
      not initialized but ep_take_care_of_epollwakeup reads its event field.
      When this unintialized field has EPOLLWAKEUP bit set, a capability check
      is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup.  This
      produces unexpected messages in the audit log, such as (on a system
      running SELinux):
      
          type=AVC msg=audit(1408212798.866:410): avc:  denied
          { block_suspend } for  pid=7754 comm="dbus-daemon" capability=36
          scontext=unconfined_u:unconfined_r:unconfined_t
          tcontext=unconfined_u:unconfined_r:unconfined_t
          tclass=capability2 permissive=1
      
          type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
          success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
          pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
          fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
          exe="/usr/bin/dbus-daemon"
          subj=unconfined_u:unconfined_r:unconfined_t key=(null)
      
      ("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")
      
      Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.
      
      Fixes: 4d7e30d9 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arve Hjønnevåg <arve@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a07e826f
    • Johannes Berg's avatar
      Revert "mac80211: disable uAPSD if all ACs are under ACM" · 8cf91d68
      Johannes Berg authored
      commit bb512ad0 upstream.
      
      This reverts commit 24aa11ab.
      
      That commit was wrong since it uses data that hasn't even been set
      up yet, but might be a hold-over from a previous connection.
      
      Additionally, it seems like a driver-specific workaround that
      shouldn't have been in mac80211 to start with.
      
      Fixes: 24aa11ab ("mac80211: disable uAPSD if all ACs are under ACM")
      Reviewed-by: default avatarLuciano Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8cf91d68
    • Felipe Balbi's avatar
      usb: dwc3: core: fix ordering for PHY suspend · 3a6e005e
      Felipe Balbi authored
      commit dc99f16f upstream.
      
      We can't suspend the PHYs before dwc3_core_exit_mode()
      has been called, that's because the host and/or device
      sides might still need to communicate with the far end
      link partner.
      
      Fixes: 8ba007a9 (usb: dwc3: core: enable the USB2 and USB3 phy in probe)
      Suggested-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3a6e005e
    • Felipe Balbi's avatar
      usb: dwc3: core: fix order of PM runtime calls · ae0262d6
      Felipe Balbi authored
      commit fed33afc upstream.
      
      Currently, we disable pm_runtime before all register
      accesses are done, this is dangerous and might lead
      to abort exceptions due to the driver trying to access
      a register which is clocked by a clock which was long
      gated.
      
      Fix that by moving pm_runtime_put_sync() and pm_runtime_disable()
      as the last thing we do before returning from our ->remove()
      method.
      
      Fixes: 72246da4 (usb: Introduce DesignWare USB3 DRD Driver)
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ae0262d6
    • Jens Axboe's avatar
      genhd: fix leftover might_sleep() in blk_free_devt() · 00c18250
      Jens Axboe authored
      commit 46f341ff upstream.
      
      Commit 2da78092 changed the locking from a mutex to a spinlock,
      so we now longer sleep in this context. But there was a leftover
      might_sleep() in there, which now triggers since we do the final
      free from an RCU callback. Get rid of it.
      Reported-by: default avatarPontus Fuchs <pontus.fuchs@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      00c18250
    • J. Bruce Fields's avatar
      lockd: fix rpcbind crash on lockd startup failure · a6130e39
      J. Bruce Fields authored
      commit 7c17705e upstream.
      
      Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
      then nfs mounting without portmap or rpcbind running using a busybox
      mount resulted in:
      
        # mount -t nfs 10.30.130.21:/opt /mnt
        svc: failed to register lockdv1 RPC service (errno 111).
        lockd_up: makesock failed, error=-111
        Unable to handle kernel paging request for data at address 0x00000030
        Faulting instruction address: 0xc055e65c
        Oops: Kernel access of bad area, sig: 11 [#1]
        MPC85xx CDS
        Modules linked in:
        CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
        task: cf29cea0 ti: cf35c000 task.ti: cf35c000
        NIP: c055e65c LR: c0566490 CTR: c055e648
        REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
        MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
        DEAR: 00000030, ESR: 00000000
      
        GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
        00000000
        GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
        10090ae8
        GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
        bfa46ef0
        GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
        cf0ded80
        NIP [c055e65c] call_start+0x14/0x34
        LR [c0566490] __rpc_execute+0x70/0x250
        Call Trace:
        [cf35db80] [00000080] 0x80 (unreliable)
        [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
        [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
        [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
        [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
        [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
        [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
        [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
        [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
        [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
        [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
        [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
        [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
        [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
        [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
        [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
        [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
        [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
        [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
      
      The addition of svc_shutdown_net() resulted in two calls to
      svc_rpcb_cleanup(); the second is no longer necessary and crashes when
      it calls rpcb_register_call with clnt=NULL.
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Fixes: 679b033d "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
      Acked-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a6130e39
    • Larry Finger's avatar
      rtlwifi: rtl8192cu: Add new ID · 7cb328c5
      Larry Finger authored
      commit c6651716 upstream.
      
      The Sitecom WLA-2102 adapter uses this driver.
      Reported-by: default avatarNico Baggus <nico-linux@noci.xs4all.nl>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nico Baggus <nico-linux@noci.xs4all.nl>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7cb328c5
    • Eliad Peller's avatar
      regulatory: add NUL to alpha2 · 91a95b41
      Eliad Peller authored
      commit a5fe8e76 upstream.
      
      alpha2 is defined as 2-chars array, but is used in multiple
      places as string (e.g. with nla_put_string calls), which
      might leak kernel data.
      
      Solve it by simply adding an extra char for the NULL
      terminator, making such operations safe.
      Signed-off-by: default avatarEliad Peller <eliadx.peller@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      91a95b41
    • Tejun Heo's avatar
      percpu: perform tlb flush after pcpu_map_pages() failure · e989b52a
      Tejun Heo authored
      commit 849f5169 upstream.
      
      If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
      Currently, it doesn't flush tlb after the partial unmapping.  This may
      be okay in most cases as the established mapping hasn't been used at
      that point but it can go wrong and when it goes wrong it'd be
      extremely difficult to track down.
      
      Flush tlb after the partial unmapping.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e989b52a
    • Tejun Heo's avatar
      percpu: fix pcpu_alloc_pages() failure path · 759ca68d
      Tejun Heo authored
      commit f0d27965 upstream.
      
      When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
      free what has already been allocated.  The invocation is across the
      whole requested range and pcpu_free_pages() will try to free all
      non-NULL pages; unfortunately, this is incorrect as
      pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
      clear the pages array and thus the array may have entries from the
      previous invocations making the partial failure path free incorrect
      pages.
      
      Fix it by open-coding the partial freeing of the already allocated
      pages.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      759ca68d
    • Honggang Li's avatar
      percpu: free percpu allocation info for uniprocessor system · d1544eb4
      Honggang Li authored
      commit 3189eddb upstream.
      
      Currently, only SMP system free the percpu allocation info.
      Uniprocessor system should free it too. For example, one x86 UML
      virtual machine with 256MB memory, UML kernel wastes one page memory.
      Signed-off-by: default avatarHonggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d1544eb4
    • James Ralston's avatar
      ata_piix: Add Device IDs for Intel 9 Series PCH · 40be39a7
      James Ralston authored
      commit 6cad1376 upstream.
      
      This patch adds the IDE mode SATA Device IDs for the Intel 9 Series PCH.
      Signed-off-by: default avatarJames Ralston <james.d.ralston@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      40be39a7
    • Robert Coulson's avatar
      hwmon: (ds1621) Update zbits after conversion rate change · 477e8e40
      Robert Coulson authored
      commit 39c627a0 upstream.
      
      After the conversion rate is changed, the zbits are not updated,
      but should be, since they are used later in the set_temp function.
      
      Fixes: a50d9a4d ("hwmon: (ds1621) Fix temperature rounding operations")
      Reported-by: default avatarMurat Ilsever <murat.ilsever@gmail.com>
      Signed-off-by: default avatarRobert Coulson <rob.coulson@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      477e8e40
    • Hans de Goede's avatar
      Input: i8042 - add nomux quirk for Avatar AVIU-145A6 · 5fb5178e
      Hans de Goede authored
      commit d2682118 upstream.
      
      The sys_vendor / product_name are somewhat generic unfortunately, so this
      may lead to some false positives. But nomux usually does no harm, where as
      not having it clearly is causing problems on the Avatar AVIU-145A6.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=77391Reported-by: default avatarHugo P <saurosii@gmail.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5fb5178e
    • Hans de Goede's avatar
    • Dmitry Torokhov's avatar
      Input: atkbd - do not try 'deactivate' keyboard on any LG laptops · 6ed73cef
      Dmitry Torokhov authored
      commit c0120679 upstream.
      
      We are getting more and more reports about LG laptops not having
      functioning keyboard if we try to deactivate keyboard during probe.
      Given that having keyboard deactivated is merely "nice to have"
      instead of a hard requirement for probing, let's disable it on all
      LG boxes instead of trying to hunt down particular models.
      
      This change is prompted by patches trying to add "LG Electronics"/"ROCKY"
      and "LG Electronics"/"LW60-F27B" to the DMI list.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=77051Reported-by: default avatarJaime Velasco Juan <jsagarribay@gmail.com>
      Reported-by: default avatarGeorgios Tsalikis <georgios@tsalikis.net>
      Tested-by: default avatarJaime Velasco Juan <jsagarribay@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6ed73cef
    • Hans de Goede's avatar
      Input: elantech - fix detection of touchpad on ASUS s301l · 3148dd71
      Hans de Goede authored
      commit 271329b3 upstream.
      
      Adjust Elantech signature validation to account fo rnewer models of
      touchpads.
      Reported-and-tested-by: default avatarMàrius Monton <marius.monton@gmail.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3148dd71
    • Dmitry Torokhov's avatar
      Input: synaptics - add support for ForcePads · f68cd1d8
      Dmitry Torokhov authored
      commit 5715fc76 upstream.
      
      ForcePads are found on HP EliteBook 1040 laptops. They lack any kind of
      physical buttons, instead they generate primary button click when user
      presses somewhat hard on the surface of the touchpad. Unfortunately they
      also report primary button click whenever there are 2 or more contacts
      on the pad, messing up all multi-finger gestures (2-finger scrolling,
      multi-finger tapping, etc). To cope with this behavior we introduce a
      delay (currently 50 msecs) in reporting primary press in case more
      contacts appear.
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f68cd1d8
    • John Sung's avatar
      Input: serport - add compat handling for SPIOCSTYPE ioctl · 0d82a371
      John Sung authored
      commit a80d8b02 upstream.
      
      When running a 32-bit inputattach utility in a 64-bit system, there will be
      error code "inputattach: can't set device type". This is caused by the
      serport device driver not supporting compat_ioctl, so that SPIOCSTYPE ioctl
      fails.
      Signed-off-by: default avatarJohn Sung <penmount.touch@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0d82a371
    • Mikulas Patocka's avatar
      dm crypt: fix access beyond the end of allocated space · 3fc612d8
      Mikulas Patocka authored
      commit d49ec52f upstream.
      
      The DM crypt target accesses memory beyond allocated space resulting in
      a crash on 32 bit x86 systems.
      
      This bug is very old (it dates back to 2.6.25 commit 3a7f6c99 "dm
      crypt: use async crypto").  However, this bug was masked by the fact
      that kmalloc rounds the size up to the next power of two.  This bug
      wasn't exposed until 3.17-rc1 commit 298a9fa0 ("dm crypt: use per-bio
      data").  By switching to using per-bio data there was no longer any
      padding beyond the end of a dm-crypt allocated memory block.
      
      To minimize allocation overhead dm-crypt puts several structures into one
      block allocated with kmalloc.  The block holds struct ablkcipher_request,
      cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
      struct dm_crypt_request and an initialization vector.
      
      The variable dmreq_start is set to offset of struct dm_crypt_request
      within this memory block.  dm-crypt allocates the block with this size:
      cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.
      
      When accessing the initialization vector, dm-crypt uses the function
      iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
      + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).
      
      dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
      structure.  However, when dm-crypt accesses the initialization vector, it
      takes a pointer to the end of dm_crypt_request, aligns it, and then uses
      it as the initialization vector.  If the end of dm_crypt_request is not
      aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
      alignment causes the initialization vector to point beyond the allocated
      space.
      
      Fix this bug by calculating the variable iv_size_padding and adding it
      to the allocated size.
      
      Also correct the alignment of dm_crypt_request.  struct dm_crypt_request
      is specific to dm-crypt (it isn't used by the crypto subsystem at all),
      so it is aligned on __alignof__(struct dm_crypt_request).
      
      Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is
      aligned as if the block was allocated with kmalloc.
      Reported-by: default avatarKrzysztof Kolasa <kkolasa@winsoft.pl>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3fc612d8
    • Anssi Hannula's avatar
      dm cache: fix race causing dirty blocks to be marked as clean · 830f9d14
      Anssi Hannula authored
      commit 40aa978e upstream.
      
      When a writeback or a promotion of a block is completed, the cell of
      that block is removed from the prison, the block is marked as clean, and
      the clear_dirty() callback of the cache policy is called.
      
      Unfortunately, performing those actions in this order allows an incoming
      new write bio for that block to come in before clearing the dirty status
      is completed and therefore possibly causing one of these two scenarios:
      
      Scenario A:
      
      Thread 1                      Thread 2
      cell_defer()                  .
      - cell removed from prison    .
      - detained bios queued        .
      .                             incoming write bio
      .                             remapped to cache
      .                             set_dirty() called,
      .                               but block already dirty
      .                               => it does nothing
      clear_dirty()                 .
      - block marked clean          .
      - policy clear_dirty() called .
      
      Result: Block is marked clean even though it is actually dirty. No
      writeback will occur.
      
      Scenario B:
      
      Thread 1                      Thread 2
      cell_defer()                  .
      - cell removed from prison    .
      - detained bios queued        .
      clear_dirty()                 .
      - block marked clean          .
      .                             incoming write bio
      .                             remapped to cache
      .                             set_dirty() called
      .                             - block marked dirty
      .                             - policy set_dirty() called
      - policy clear_dirty() called .
      
      Result: Block is properly marked as dirty, but policy thinks it is clean
      and therefore never asks us to writeback it.
      This case is visible in "dmsetup status" dirty block count (which
      normally decreases to 0 on a quiet device).
      
      Fix these issues by calling clear_dirty() before calling cell_defer().
      Incoming bios for that block will then be detained in the cell and
      released only after clear_dirty() has completed, so the race will not
      occur.
      
      Found by inspecting the code after noticing spurious dirty counts
      (scenario B).
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      830f9d14
    • Keith Busch's avatar
      block: Fix dev_t minor allocation lifetime · 5dbe54b2
      Keith Busch authored
      commit 2da78092 upstream.
      
      Releases the dev_t minor when all references are closed to prevent
      another device from acquiring the same major/minor.
      
      Since the partition's release may be invoked from call_rcu's soft-irq
      context, the ext_dev_idr's mutex had to be replaced with a spinlock so
      as not so sleep.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5dbe54b2
    • Tejun Heo's avatar
      workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() · 7e64965b
      Tejun Heo authored
      commit e09c2c29 upstream.
      
      create_singlethread_workqueue() is a compat interface for single
      threaded workqueue which maps to ordered workqueue w/ rescuer in the
      current implementation.  create_singlethread_workqueue() currently
      implemented by invoking alloc_workqueue() w/ appropriate parameters.
      
      8719dcea ("workqueue: reject adjusting max_active or applying
      attrs to ordered workqueues") introduced __WQ_ORDERED to protect
      ordered workqueues against dynamic attribute changes which can break
      ordering guarantees but forgot to apply it to
      create_singlethread_workqueue().  This in itself is okay as nobody
      currently uses dynamic attribute change on workqueues created with
      create_singlethread_workqueue().
      
      However, 4c16bd32 ("workqueue: implement NUMA affinity for unbound
      workqueues") broke singlethreaded guarantee for ordered workqueues
      through allocating a separate pool_workqueue on each NUMA node by
      default.  A later change 8a2b7538 ("workqueue: fix ordered
      workqueues in NUMA setups") fixed it by allocating only one global
      pool_workqueue if __WQ_ORDERED is set.
      
      Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
      became critical breaking its single threadedness and ordering
      guarantee.
      
      Let's make create_singlethread_workqueue() wrap
      alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
      can implicitly track future ordered_workqueue changes.
      
      v2: I missed that __WQ_ORDERED now protects against pwq splitting
          across NUMA nodes and incorrectly described the patch as a
          nice-to-have fix to protect against future dynamic attribute
          usages.  Oleg pointed out that this is actually a critical
          breakage due to 8a2b7538 ("workqueue: fix ordered workqueues
          in NUMA setups").
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Anderson <mike.anderson@us.ibm.com>
      Cc: Oleg Nesterov <onestero@redhat.com>
      Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
      Cc: Tomas Henzl <thenzl@redhat.com>
      Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7e64965b