1. 15 Feb, 2016 8 commits
  2. 12 Feb, 2016 32 commits
    • J. Bruce Fields's avatar
      dcache: use IS_ROOT to decide where dentry is hashed · 173a4661
      J. Bruce Fields authored
      commit 7632e465 upstream.
      
      Every hashed dentry is either hashed in the dentry_hashtable, or a
      superblock's s_anon list.
      
      __d_drop() assumes it can determine which is the case by checking
      DCACHE_DISCONNECTED; this is not true.
      
      It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
      only hashed on dentry_hashtable, but is fully connected to its parents
      back to the root.
      
      But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
      attempts to connect a directory (found by filehandle lookup) back to
      root by ascending to parents and performing lookups one at a time.  It
      does not clear DCACHE_DISCONNECTED until it's done, and that is not at
      all an atomic process.
      
      In particular, it is possible for DCACHE_DISCONNECTED to be set on a
      dentry which is hashed on the dentry_hashtable.
      
      Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
      *does* work:
      
      Dentries are hashed only by:
      
      	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.
      
      	- __d_rehash, called by _d_rehash: hashes to the dentry's
      	  parent, and all callers of _d_rehash appear to have d_parent
      	  set to a "real" parent.
      	- __d_rehash, called by __d_move: rehashes the moved dentry to
      	  hash chain determined by target, and assigns target's d_parent
      	  to its d_parent, before dropping the dentry's d_lock.
      
      Therefore I believe it's safe for a holder of a dentry's d_lock to
      assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
      true.
      
      I believe the incorrect assumption about DCACHE_DISCONNECTED was
      originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
      locking".
      
      Also add a comment while we're here.
      
      Cc: Nick Piggin <npiggin@kernel.dk>
      Acked-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      173a4661
    • Jiri Slaby's avatar
      Linux 3.12.54 · 9f9aab6e
      Jiri Slaby authored
      9f9aab6e
    • Nikolay Borisov's avatar
      dm thin: fix race condition when destroying thin pool workqueue · 021cc4c8
      Nikolay Borisov authored
      commit 18d03e8c upstream.
      
      When a thin pool is being destroyed delayed work items are
      cancelled using cancel_delayed_work(), which doesn't guarantee that on
      return the delayed item isn't running.  This can cause the work item to
      requeue itself on an already destroyed workqueue.  Fix this by using
      cancel_delayed_work_sync() which guarantees that on return the work item
      is not running anymore.
      
      Fixes: 905e51b3 ("dm thin: commit outstanding data every second")
      Fixes: 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      021cc4c8
    • Ioan-Adrian Ratiu's avatar
      HID: usbhid: fix recursive deadlock · d43afc4a
      Ioan-Adrian Ratiu authored
      commit e470127e upstream.
      
      The critical section protected by usbhid->lock in hid_ctrl() is too
      big and because of this it causes a recursive deadlock. "Too big" means
      the case statement and the call to hid_input_report() do not need to be
      protected by the spinlock (no URB operations are done inside them).
      
      The deadlock happens because in certain rare cases drivers try to grab
      the lock while handling the ctrl irq which grabs the lock before them
      as described above. For example newer wacom tablets like 056a:033c try
      to reschedule proximity reads from wacom_intuos_schedule_prox_event()
      calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
      which tries to grab the usbhid lock already held by hid_ctrl().
      
      There are two ways to get out of this deadlock:
          1. Make the drivers work "around" the ctrl critical region, in the
          wacom case for ex. by delaying the scheduling of the proximity read
          request itself to a workqueue.
          2. Shrink the critical region so the usbhid lock protects only the
          instructions which modify usbhid state, calling hid_input_report()
          with the spinlock unlocked, allowing the device driver to grab the
          lock first, finish and then grab the lock afterwards in hid_ctrl().
      
      This patch implements the 2nd solution.
      Signed-off-by: default avatarIoan-Adrian Ratiu <adi@adirat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d43afc4a
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 48a17e3c
      Seth Jennings authored
      commit 26bbe7ef upstream.
      
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      48a17e3c
    • Mike Snitzer's avatar
      dm btree: fix leak of bufio-backed block in btree_split_sibling error path · 1b53306f
      Mike Snitzer authored
      commit 30ce6e1c upstream.
      
      The block allocated at the start of btree_split_sibling() is never
      released if later insert_at() fails.
      
      Fix this by releasing the previously allocated bufio block using
      unlock_block().
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b53306f
    • Herbert Xu's avatar
      crypto: algif_hash - Only export and import on sockets with data · 23130403
      Herbert Xu authored
      commit 4afa5f96 upstream.
      
      The hash_accept call fails to work on sockets that have not received
      any data.  For some algorithm implementations it may cause crashes.
      
      This patch fixes this by ensuring that we only export and import on
      sockets that have received data.
      Reported-by: default avatarHarsh Jain <harshjain.prof@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      23130403
    • libin's avatar
      recordmcount: Fix endianness handling bug for nop_mcount · 4366e332
      libin authored
      commit c84da8b9 upstream.
      
      In nop_mcount, shdr->sh_offset and welp->r_offset should handle
      endianness properly, otherwise it will trigger Segmentation fault
      if the recordmcount main and file.o have different endianness.
      
      Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4366e332
    • Greg Kroah-Hartman's avatar
      xhci: fix placement of call to usb_disabled() · b48d0542
      Greg Kroah-Hartman authored
      In the backport of 1eaf35e4, the call to
      usb_disabled() was too late, after we had already done some allocation.
      Move that call to the top of the function instead, making the logic
      match what is intended and is in the original patch.
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b48d0542
    • Tejun Heo's avatar
      Revert "workqueue: make sure delayed work run in local cpu" · 237c785f
      Tejun Heo authored
      commit 041bd12e upstream.
      
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      237c785f
    • Linus Torvalds's avatar
      vmstat: explicitly schedule per-cpu work on the CPU we need it to run on · 933f407f
      Linus Torvalds authored
      commit 176bed1d upstream.
      
      The vmstat code uses "schedule_delayed_work_on()" to do the initial
      startup of the delayed work on the right CPU, but then once it was
      started it would use the non-cpu-specific "schedule_delayed_work()" to
      re-schedule it on that CPU.
      
      That just happened to schedule it on the same CPU historically (well, in
      almost all situations), but the code _requires_ this work to be per-cpu,
      and should say so explicitly rather than depend on the non-cpu-specific
      scheduling to schedule on the current CPU.
      
      The timer code is being changed to not be as single-minded in always
      running things on the calling CPU.
      
      See also commit 874bbfe6 ("workqueue: make sure delayed work run in
      local cpu") that for now maintains the local CPU guarantees just in case
      there are other broken users that depended on the accidental behavior.
      
      js: 3.12 backport
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mike Galbraith <mgalbraith@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      933f407f
    • Andrew Morton's avatar
      openrisc: fix CONFIG_UID16 setting · 6b443a1b
      Andrew Morton authored
      commit 04ea1e91 upstream.
      
      openrisc-allnoconfig:
      
        kernel/uid16.c: In function 'SYSC_setgroups16':
        kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
        kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
      
      openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
      
      Fixes: 2813893f ("kernel: conditionally support non-root users, groups and capabilities")
      Reported-by: default avatarFengguang Wu <fengguang.wu@gmail.com>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6b443a1b
    • Jiri Slaby's avatar
      x86: vvar, fix excessive gcc-6 DECLARE_VVAR warnings · d6ace935
      Jiri Slaby authored
      On 3.12, with gcc-6, I see a lot of:
      arch/x86/include/asm/vvar.h:33:28: warning: ‘vvaraddr_jiffies’ defined but not used [-Wunused-const-variable]
        static type const * const vvaraddr_ ## name =   \
                                  ^
      arch/x86/include/asm/vvar.h:46:1: note: in expansion of macro ‘DECLARE_VVAR’
       DECLARE_VVAR(0, volatile unsigned long, jiffies)
       ^~~~~~~~~~~~
      
      In upstream, this is fixed by ef721987 (x86, vdso: Introduce VVAR
      marco for vdso32) and f40c3300 (x86, vdso: Move the vvar and hpet
      mappings next to the 64-bit vDSO). But this is not applicable to
      stable.
      
      So mark the vvar declaration as __maybe_unused and be done with it.
      This will generate it to the code only if it is used. I.e. the same as
      with gcc < 6.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Andy Lutomirski <luto@amacapital.net>
      d6ace935
    • Joe Perches's avatar
      compiler-gcc: integrate the various compiler-gcc[345].h files · 1fa9b58c
      Joe Perches authored
      commit cb984d10 upstream.
      
      As gcc major version numbers are going to advance rather rapidly in the
      future, there's no real value in separate files for each compiler
      version.
      
      Deduplicate some of the macros #defined in each file too.
      
      Neaten comments using normal kernel commenting style.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Sasha Levin <levinsasha928@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Alan Modra <amodra@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1fa9b58c
    • Steven Noonan's avatar
      compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles · e3e923ee
      Steven Noonan authored
      commit 5631b8fb upstream.
      
      The bug referenced by the comment in this commit was not
      completely fixed in GCC 4.8.2, as I mentioned in a thread back
      in February:
      
         https://lkml.org/lkml/2014/2/12/797
      
      The conclusion at that time was to make the quirk unconditional
      until the bug could be found and fixed in GCC. Unfortunately,
      when I submitted the patch (commit a9f18034) I left a comment
      in that claimed the bug was fixed in GCC 4.8.2+.
      
      This comment is inaccurate, and should be removed.
      Signed-off-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e3e923ee
    • Yang Shi's avatar
      arm64: restore bogomips information in /proc/cpuinfo · 0b3f87b4
      Yang Shi authored
      commit 92e788b7 upstream.
      
      As previously reported, some userspace applications depend on bogomips
      showed by /proc/cpuinfo. Although there is much less legacy impact on
      aarch64 than arm, it does break libvirt.
      
      This patch reverts commit 326b16db ("arm64: delay: don't bother
      reporting bogomips in /proc/cpuinfo"), but with some tweak due to
      context change and without the pr_info().
      
      Fixes: 326b16db ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo")
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0b3f87b4
    • Guenter Roeck's avatar
      mn10300: Select CONFIG_HAVE_UID16 to fix build failure · f5e77f5a
      Guenter Roeck authored
      commit c86576ea upstream.
      
      mn10300 builds fail with
      
      fs/stat.c: In function 'cp_old_stat':
      fs/stat.c:163:2: error: 'old_uid_t' undeclared
      
      ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
      ipc/util.c:540:2: error: 'old_uid_t' undeclared
      
      Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
      to fix the problem.
      
      Fixes: fbc416ff ("arm64: fix building without CONFIG_UID16")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAcked-by: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f5e77f5a
    • Richard Purdie's avatar
      HID: core: Avoid uninitialized buffer access · 19a3cb32
      Richard Purdie authored
      commit 79b568b9 upstream.
      
      hid_connect adds various strings to the buffer but they're all
      conditional. You can find circumstances where nothing would be written
      to it but the kernel will still print the supposedly empty buffer with
      printk. This leads to corruption on the console/in the logs.
      
      Ensure buf is initialized to an empty string.
      Signed-off-by: default avatarRichard Purdie <richard.purdie@linuxfoundation.org>
      [dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: linux-input@vger.kernel.org
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      19a3cb32
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · 01bdbe95
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      01bdbe95
    • Will Deacon's avatar
      arm64: mm: ensure that the zero page is visible to the page table walker · 13d8053e
      Will Deacon authored
      commit 32d63978 upstream.
      
      In paging_init, we allocate the zero page, memset it to zero and then
      point TTBR0 to it in order to avoid speculative fetches through the
      identity mapping.
      
      In order to guarantee that the freshly zeroed page is indeed visible to
      the page table walker, we need to execute a dsb instruction prior to
      writing the TTBR.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13d8053e
    • John Blackwood's avatar
      arm64: Clear out any singlestep state on a ptrace detach operation · 8abc0d5b
      John Blackwood authored
      commit 5db4fd8c upstream.
      
      Make sure to clear out any ptrace singlestep state when a ptrace(2)
      PTRACE_DETACH call is made on arm64 systems.
      
      Otherwise, the previously ptraced task will die off with a SIGTRAP
      signal if the debugger just previously singlestepped the ptraced task.
      Signed-off-by: default avatarJohn Blackwood <john.blackwood@ccur.com>
      [will: added comment to justify why this is in the arch code]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8abc0d5b
    • Arnd Bergmann's avatar
      arm64: fix building without CONFIG_UID16 · 4f29293d
      Arnd Bergmann authored
      commit fbc416ff upstream.
      
      As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
      disabled currently fails because the system call table still needs to
      reference the individual function entry points that are provided by
      kernel/sys_ni.c in this case, and the declarations are hidden inside
      of #ifdef CONFIG_UID16:
      
      arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
       __SYSCALL(__NR_lchown, sys_lchown16)
      
      I believe this problem only exists on ARM64, because older architectures
      tend to not need declarations when their system call table is built
      in assembly code, while newer architectures tend to not need UID16
      support. ARM64 only uses these system calls for compatibility with
      32-bit ARM binaries.
      
      This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
      set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
      declarations whenever we need them, but otherwise the behavior is
      unchanged.
      
      Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4f29293d
    • Marc Zyngier's avatar
      arm64: KVM: Fix AArch32 to AArch64 register mapping · 9bd3d70a
      Marc Zyngier authored
      commit c0f09634 upstream.
      
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9bd3d70a
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · a69c779e
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a69c779e
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · 166947a7
      Boqun Feng authored
      commit 81d7a329 upstream.
      
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
      b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      166947a7
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · 432c20b3
      Boqun Feng authored
      commit 49e9cf3f upstream.
      
      According to memory-barriers.txt:
      
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      
      https://lkml.org/lkml/2015/10/14/970
      
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      
      Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      432c20b3
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · e9214d10
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      
      Found using a syscall fuzzer.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e9214d10
    • Dan Streetman's avatar
      xfrm: dst_entries_init() per-net dst_ops · 3e29fa5b
      Dan Streetman authored
      [ Upstream commit a8a572a6 ]
      
      Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
      templates; their dst_entries counters will never be used.  Move the
      xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
      xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
      and dst_entries_destroy for each net namespace.
      
      The ipv4 and ipv6 xfrms each create dst_ops template, and perform
      dst_entries_init on the templates.  The template values are copied to each
      net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
      pcpuc_entries field is a percpu counter and cannot be used correctly by
      simply copying it to another object.
      
      The result of this is a very subtle bug; changes to the dst entries
      counter from one net namespace may sometimes get applied to a different
      net namespace dst entries counter.  This is because of how the percpu
      counter works; it has a main count field as well as a pointer to the
      percpu variables.  Each net namespace maintains its own main count
      variable, but all point to one set of percpu variables.  When any net
      namespace happens to change one of the percpu variables to outside its
      small batch range, its count is moved to the net namespace's main count
      variable.  So with multiple net namespaces operating concurrently, the
      dst_ops entries counter can stray from the actual value that it should
      be; if counts are consistently moved from one net namespace to another
      (which my testing showed is likely), then one net namespace winds up
      with a negative dst_ops count while another winds up with a continually
      increasing count, eventually reaching its gc_thresh limit, which causes
      all new traffic on the net namespace to fail with -ENOBUFS.
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3e29fa5b
    • Ido Schimmel's avatar
      team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid · a319d186
      Ido Schimmel authored
      [ Upstream commit 60a6531b ]
      
      We can't be within an RCU read-side critical section when deleting
      VLANs, as underlying drivers might sleep during the hardware operation.
      Therefore, replace the RCU critical section with a mutex. This is
      consistent with team_vlan_rx_add_vid.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a319d186
    • Eric Dumazet's avatar
      ipv6: update skb->csum when CE mark is propagated · 28882454
      Eric Dumazet authored
      [ Upstream commit 34ae6a1a ]
      
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      28882454
    • Eric Dumazet's avatar
      phonet: properly unshare skbs in phonet_rcv() · 2f962373
      Eric Dumazet authored
      [ Upstream commit 7aaed57c ]
      
      Ivaylo Dimitrov reported a regression caused by commit 7866a621
      ("dev: add per net_device packet type chains").
      
      skb->dev becomes NULL and we crash in __netif_receive_skb_core().
      
      Before above commit, different kind of bugs or corruptions could happen
      without major crash.
      
      But the root cause is that phonet_rcv() can queue skb without checking
      if skb is shared or not.
      
      Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
      Reported-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Tested-by: default avatarIvaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2f962373
    • Neal Cardwell's avatar
      tcp_yeah: don't set ssthresh below 2 · 113cc80b
      Neal Cardwell authored
      [ Upstream commit 83d15e70 ]
      
      For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno
      and CUBIC, per RFC 5681 (equation 4).
      
      tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh
      value if the intended reduction is as big or bigger than the current
      cwnd. Congestion control modules should never return a zero or
      negative ssthresh. A zero ssthresh generally results in a zero cwnd,
      causing the connection to stall. A negative ssthresh value will be
      interpreted as a u32 and will set a target cwnd for PRR near 4
      billion.
      
      Oleksandr Natalenko reported that a system using tcp_yeah with ECN
      could see a warning about a prior_cwnd of 0 in
      tcp_cwnd_reduction(). Testing verified that this was due to
      tcp_yeah_ssthresh() misbehaving in this way.
      Reported-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      113cc80b