1. 17 Sep, 2014 5 commits
    • Liu Bo's avatar
      Btrfs: fix crash on endio of reading corrupted block · 8f543e81
      Liu Bo authored
      commit 38c1c2e4 upstream.
      
      The crash is
      
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/extent_io.c:2124!
      [...]
      Workqueue: btrfs-endio normal_work_helper [btrfs]
      RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
      
      This is in fact a regression.
      
      It is because we forgot to increase @offset properly in reading corrupted block,
      so that the @offset remains, and this leads to checksum errors while reading
      left blocks queued up in the same bio, and then ends up with hiting the above
      BUG_ON.
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8f543e81
    • Liu Bo's avatar
      Btrfs: fix compressed write corruption on enospc · 28758933
      Liu Bo authored
      commit ce62003f upstream.
      
      When failing to allocate space for the whole compressed extent, we'll
      fallback to uncompressed IO, but we've forgotten to redirty the pages
      which belong to this compressed extent, and these 'clean' pages will
      simply skip 'submit' part and go to endio directly, at last we got data
      corruption as we write nothing.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Tested-By: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      28758933
    • Filipe Manana's avatar
      Btrfs: read lock extent buffer while walking backrefs · 308d30d1
      Filipe Manana authored
      commit 6f7ff6d7 upstream.
      
      Before processing the extent buffer, acquire a read lock on it, so
      that we're safe against concurrent updates on the extent buffer.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      308d30d1
    • Filipe Manana's avatar
      Btrfs: fix csum tree corruption, duplicate and outdated checksums · e2e8bf61
      Filipe Manana authored
      commit 27b9a812 upstream.
      
      Under rare circumstances we can end up leaving 2 versions of a checksum
      for the same file extent range.
      
      The reason for this is that after calling btrfs_next_leaf we process
      slot 0 of the leaf it returns, instead of processing the slot set in
      path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
      btrfs_next_leaf() releases the path and before it searches for the next
      leaf, another task might cause a split of the next leaf, which migrates
      some of its keys to the leaf we were processing before calling
      btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
      same leaf but with path->slots[0] having a slot number corresponding
      to the first new key it got, that is, a slot number that didn't exist
      before calling btrfs_next_leaf(), as the leaf now has more keys than
      it had before. So we must really process the returned leaf starting at
      path->slots[0] always, as it isn't always 0, and the key at slot 0 can
      have an offset much lower than our search offset/bytenr.
      
      For example, consider the following scenario, where we have:
      
      sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
      four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472
      
        Leaf N:
      
          slot = 0                           slot = btrfs_header_nritems() - 1
        |-------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
        |-------------------------------------------------------------------|
      
        Leaf N + 1:
      
            slot = 0                          slot = btrfs_header_nritems() - 1
        |--------------------------------------------------------------------|
        | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
        |--------------------------------------------------------------------|
      
      Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
      find the next highest key, which releases the current path and then searches
      for that next key. However after releasing the path and before finding that
      next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
      to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
      btrfs_next_leaf() will returns us a path again with leaf N but with the slot
      pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
      is then:
      
          slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
        |----------------------------------------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
        |----------------------------------------------------------------------------------------------------|
      
      And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
      into the "insert:" label, which will set tmp to:
      
          tmp = min((sums->len - total_bytes) >> blocksize_bits,
              (next_offset - file_key.offset) >> blocksize_bits) =
          min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
          min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4
      
      and
      
         ins_size = csum_size * tmp = 4 * 4 = 16 bytes.
      
      In other words, we insert a new csum item in the tree with key
      (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
      for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
      because the item with key (CSUM CSUM 40161280) (the one that was moved from
      leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
      bytes of our data and won't get those old checksums removed.
      
      So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
      and breaks the logical rule:
      
         Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover
      
      An obvious bad effect of this is that a subsequent csum tree lookup to get
      the checksum of any of the blocks with logical offset of 40161280, 40165376
      or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e2e8bf61
    • Takashi Iwai's avatar
      Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · e7b4f1fa
      Takashi Iwai authored
      commit 4eb1f66d upstream.
      
      We've got bug reports that btrfs crashes when quota is enabled on
      32bit kernel, typically with the Oops like below:
       BUG: unable to handle kernel NULL pointer dereference at 00000004
       IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
       *pde = 00000000
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
       Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
       task: f1478130 ti: f147c000 task.ti: f147c000
       EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
       EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
       EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
       ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
        DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
       CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
       Stack:
        00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
        00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
        00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
       Call Trace:
        [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
        [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
        [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
        [<c025e38b>] process_one_work+0x11b/0x390
        [<c025eea1>] worker_thread+0x101/0x340
        [<c026432b>] kthread+0x9b/0xb0
        [<c0712a71>] ret_from_kernel_thread+0x21/0x30
        [<c0264290>] kthread_create_on_node+0x110/0x110
      
      This indicates a NULL corruption in prefs_delayed list.  The further
      investigation and bisection pointed that the call of ulist_add_merge()
      results in the corruption.
      
      ulist_add_merge() takes u64 as aux and writes a 64bit value into
      old_aux.  The callers of this function in backref.c, however, pass a
      pointer of a pointer to old_aux.  That is, the function overwrites
      64bit value on 32bit pointer.  This caused a NULL in the adjacent
      variable, in this case, prefs_delayed.
      
      Here is a quick attempt to band-aid over this: a new function,
      ulist_add_merge_ptr() is introduced to pass/store properly a pointer
      value instead of u64.  There are still ugly void ** cast remaining
      in the callers because void ** cannot be taken implicitly.  But, it's
      safer than explicit cast to u64, anyway.
      
      Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e7b4f1fa
  2. 15 Sep, 2014 4 commits
    • Linus Torvalds's avatar
      vfs: fix bad hashing of dentries · be2378cb
      Linus Torvalds authored
      commit 99d263d4 upstream.
      
      Josef Bacik found a performance regression between 3.2 and 3.10 and
      narrowed it down to commit bfcfaa77 ("vfs: use 'unsigned long'
      accesses for dcache name comparison and hashing"). He reports:
      
       "The test case is essentially
      
            for (i = 0; i < 1000000; i++)
                    mkdir("a$i");
      
        On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
        dir/sec with 3.10.  This is because we spend waaaaay more time in
        __d_lookup on 3.10 than in 3.2.
      
        The new hashing function for strings is suboptimal for <
        sizeof(unsigned long) string names (and hell even > sizeof(unsigned
        long) string names that I've tested).  I broke out the old hashing
        function and the new one into a userspace helper to get real numbers
        and this is what I'm getting:
      
            Old hash table had 1000000 entries, 0 dupes, 0 max dupes
            New hash table had 12628 entries, 987372 dupes, 900 max dupes
            We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash
      
        My test does the hash, and then does the d_hash into a integer pointer
        array the same size as the dentry hash table on my system, and then
        just increments the value at the address we got to see how many
        entries we overlap with.
      
        As you can see the old hash function ended up with all 1 million
        entries in their own bucket, whereas the new one they are only
        distributed among ~12.5k buckets, which is why we're using so much
        more CPU in __d_lookup".
      
      The reason for this hash regression is two-fold:
      
       - On 64-bit architectures the down-mixing of the original 64-bit
         word-at-a-time hash into the final 32-bit hash value is very
         simplistic and suboptimal, and just adds the two 32-bit parts
         together.
      
         In particular, because there is no bit shuffling and the mixing
         boundary is also a byte boundary, similar character patterns in the
         low and high word easily end up just canceling each other out.
      
       - the old byte-at-a-time hash mixed each byte into the final hash as it
         hashed the path component name, resulting in the low bits of the hash
         generally being a good source of hash data.  That is not true for the
         word-at-a-time case, and the hash data is distributed among all the
         bits.
      
      The fix is the same in both cases: do a better job of mixing the bits up
      and using as much of the hash data as possible.  We already have the
      "hash_32|64()" functions to do that.
      Reported-by: default avatarJosef Bacik <jbacik@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      be2378cb
    • Al Viro's avatar
      dcache.c: get rid of pointless macros · c9258b20
      Al Viro authored
      commit 482db906 upstream.
      
      D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
      At this point they are actively misguiding - they imply that values are
      compiler constants, which is no longer true.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c9258b20
    • Thomas Gleixner's avatar
      futex: Unlock hb->lock in futex_wait_requeue_pi() error path · 70279f38
      Thomas Gleixner authored
      commit 13c42c2f upstream.
      
      futex_wait_requeue_pi() calls futex_wait_setup(). If
      futex_wait_setup() succeeds it returns with hb->lock held and
      preemption disabled. Now the sanity check after this does:
      
              if (match_futex(&q.key, &key2)) {
      	   	ret = -EINVAL;
      		goto out_put_keys;
      	}
      
      which releases the keys but does not release hb->lock.
      
      So we happily return to user space with hb->lock held and therefor
      preemption disabled.
      
      Unlock hb->lock before taking the exit route.
      Reported-by: default avatarDave "Trinity" Jones <davej@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Reviewed-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      70279f38
    • Theodore Ts'o's avatar
      ext4: fix BUG_ON in mb_free_blocks() · bbcd86fa
      Theodore Ts'o authored
      commit c99d1e6e upstream.
      
      If we suffer a block allocation failure (for example due to a memory
      allocation failure), it's possible that we will call
      ext4_discard_allocated_blocks() before we've actually allocated any
      blocks.  In that case, fe_len and fe_start in ac->ac_f_ex will still
      be zero, and this will result in mb_free_blocks(inode, e4b, 0, 0)
      triggering the BUG_ON on mb_free_blocks():
      
      	BUG_ON(last >= (sb->s_blocksize << 3));
      
      Fix this by bailing out of ext4_discard_allocated_blocks() if fs_len
      is zero.
      
      Also fix a missing ext4_mb_unload_buddy() call in
      ext4_discard_allocated_blocks().
      
      Google-Bug-Id: 16844242
      
      Fixes: 86f0afd4Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bbcd86fa
  3. 12 Sep, 2014 2 commits
    • Prarit Bhargava's avatar
      x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() · 66acfbf7
      Prarit Bhargava authored
      commit 39424e89 upstream.
      
      Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2
      
      kbuild, 0day kernel build service, outputs the warning:
      
      arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes
      is larger than 2048 bytes [-Wframe-larger-than=]
      
      because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the
      stack.   Fix this by moving the two cpumasks to a global file context.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-prarit@redhat.com
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      66acfbf7
    • Prarit Bhargava's avatar
      x86: Add check for number of available vectors before CPU down · 26a01cb8
      Prarit Bhargava authored
      commit da6139e4 upstream.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: default avatarGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      26a01cb8
  4. 09 Sep, 2014 2 commits
  5. 03 Sep, 2014 27 commits
    • Jiri Slaby's avatar
      Linux 3.12.28 · 3b89848d
      Jiri Slaby authored
      3b89848d
    • Stephen M. Cameron's avatar
      hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl · 085f7174
      Stephen M. Cameron authored
      commit 0758f4f7 upstream.
      
      When copy_from_user fails, return -EFAULT, not -ENOMEM
      Signed-off-by: default avatarStephen M. Cameron <scameron@beardog.cce.hp.com>
      Reported-by: default avatarRobert Elliott <elliott@hp.com>
      Reviewed-by: default avatarJoe Handzik <joseph.t.handzik@hp.com>
      Reviewed-by: default avatarScott Teel <scott.teel@hp.com>
      Reviewed by: Mike MIller <michael.miller@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      085f7174
    • David Vrabel's avatar
      x86/xen: resume timer irqs early · 82908a4a
      David Vrabel authored
      commit 8d5999df upstream.
      
      If the timer irqs are resumed during device resume it is possible in
      certain circumstances for the resume to hang early on, before device
      interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
      occur in ~0.5% of resume attempts.
      
      It is not entirely clear what is occuring the point of the hang but I
      think a task necessary for the resume calls schedule_timeout(),
      waiting for a timer interrupt (which never arrives).  This failure may
      require specific tasks to be running on the other VCPUs to trigger
      (processes are not frozen during a suspend/resume if PREEMPT is
      disabled).
      
      Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
      syscore_resume().
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      82908a4a
    • Matt Fleming's avatar
      x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub · 465502e7
      Matt Fleming authored
      commit 7b2a583a upstream.
      
      Without CONFIG_RELOCATABLE the early boot code will decompress the
      kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
      days, that isn't going to fly with UEFI since parts of the firmware
      code/data may be located at LOAD_PHYSICAL_ADDR.
      
      Straying outside of the bounds of the regions we've explicitly requested
      from the firmware will cause all sorts of trouble. Bruno reports that
      his machine resets while trying to decompress the kernel image.
      
      We already go to great pains to ensure the kernel is loaded into a
      suitably aligned buffer, it's just that the address isn't necessarily
      LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
      by the firmware.
      
      Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
      can load the kernel at any address with the correct alignment.
      Reported-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      465502e7
    • Andy Lutomirski's avatar
      x86_64/vsyscall: Fix warn_bad_vsyscall log output · de17f3c9
      Andy Lutomirski authored
      commit 53b884ac upstream.
      
      This commit in Linux 3.6:
      
          commit c767a54b
          Author: Joe Perches <joe@perches.com>
          Date:   Mon May 21 19:50:07 2012 -0700
      
              x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
      
      caused warn_bad_vsyscall to output garbage in the middle of the
      line.  Revert the bad part of it.
      
      The printk in question isn't actually bare; the level is "%s".
      
      The bug this fixes is purely cosmetic; backports are optional.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      de17f3c9
    • Christoph Schulz's avatar
      x86: don't exclude low BIOS area when allocating address space for non-PCI cards · 4804ca79
      Christoph Schulz authored
      commit cbace46a upstream.
      
      Commit 30919b0b ("x86: avoid low BIOS area when allocating address
      space") moved the test for resource allocations that fall within the first
      1MB of address space from the PCI-specific path to a generic path, such
      that all resource allocations will avoid this area.  However, this breaks
      ISA cards which need to allocate a memory region within the first 1MB.  An
      example is the i82365 PCMCIA controller and derivatives like the Ricoh
      RF5C296/396 which map part of the PCMCIA socket memory address space into
      the first 1MB of system memory address space.  They do not work anymore as
      no usable memory region exists due to this change:
      
        Intel ISA PCIC probe: Ricoh RF5C296/396 ISA-to-PCMCIA at port 0x3e0 ofs 0x00, 2 sockets
        host opts [0]: none
        host opts [1]: none
        ISA irqs (scanned) = 3,4,5,9,10 status change on irq 10
        pcmcia_socket pcmcia_socket1: pccard: PCMCIA card inserted into slot 1
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0xa00-0xaff: clean.
        pcmcia_socket pcmcia_socket0: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0d0000-0x0dffff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x0e0000-0x0effff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0x60000000-0x60ffffff: clean.
        pcmcia_socket pcmcia_socket0: cs: memory probe 0xa0000000-0xa0ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0xa00-0xaff: clean.
        pcmcia_socket pcmcia_socket1: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0d0000-0x0dffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0e0000-0x0effff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x60000000-0x60ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0xa0000000-0xa0ffffff: clean.
        pcmcia_socket pcmcia_socket1: cs: memory probe 0x0cc000-0x0effff: excluding 0xe0000-0xeffff
        pcmcia_socket pcmcia_socket1: cs: unable to map card memory!
      
      If filtering out the first 1MB is reverted, everything works as expected.
      Tested-by: default avatarRobert Resch <fli4l@robert.reschpara.de>
      Signed-off-by: default avatarChristoph Schulz <develop@kristov.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4804ca79
    • Vidya Sagar's avatar
      PCI: Configure ASPM when enabling device · b7be971e
      Vidya Sagar authored
      commit 1f6ae47e upstream.
      
      We can't do ASPM configuration at enumeration-time because enabling it
      makes some defective hardware unresponsive, even if ASPM is disabled later
      (see 41cd766b ("PCI: Don't enable aspm before drivers have had a chance
      to veto it").  Therefore, we have to do it after a driver claims the
      device.
      
      We previously configured ASPM in pci_set_power_state(), but that's not a
      very good place because it's not really related to setting the PCI device
      power state, and doing it there means:
      
        - We incorrectly skipped ASPM config when setting a device that's
          already in D0 to D0.
      
        - We unnecessarily configured ASPM when setting a device to a low-power
          state (the ASPM feature only applies when the device is in D0).
      
        - We unnecessarily configured ASPM when called from a .resume() method
          (ASPM configuration needs to be restored during resume, but
          pci_restore_pcie_state() should already do this).
      
      Move ASPM configuration from pci_set_power_state() to
      do_pci_enable_device() so we do it when a driver enables a device.
      
      [bhelgaas: changelog]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=79621
      Fixes: db288c9c ("PCI / PM: restore the original behavior of pci_set_power_state()")
      Suggested-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarVidya Sagar <sagar.tv@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b7be971e
    • Alex Deucher's avatar
      drm/radeon: add additional SI pci ids · 0d99de50
      Alex Deucher authored
      commit 37dbeab7 upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0d99de50
    • Alex Deucher's avatar
      drm/radeon: add new bonaire pci ids · 77b2045e
      Alex Deucher authored
      commit 5fc540ed upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      77b2045e
    • Alex Deucher's avatar
      6eaabaa6
    • Michael S. Tsirkin's avatar
      kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) · e35b1e9f
      Michael S. Tsirkin authored
      commit 350b8bdd upstream.
      
      The third parameter of kvm_iommu_put_pages is wrong,
      It should be 'gfn - slot->base_gfn'.
      
      By making gfn very large, malicious guest or userspace can cause kvm to
      go to this error path, and subsequently to pass a huge value as size.
      Alternatively if gfn is small, then pages would be pinned but never
      unpinned, causing host memory leak and local DOS.
      
      Passing a reasonable but large value could be the most dangerous case,
      because it would unpin a page that should have stayed pinned, and thus
      allow the device to DMA into arbitrary memory.  However, this cannot
      happen because of the condition that can trigger the error:
      
      - out of memory (where you can't allocate even a single page)
        should not be possible for the attacker to trigger
      
      - when exceeding the iommu's address space, guest pages after gfn
        will also exceed the iommu's address space, and inside
        kvm_iommu_put_pages() the iommu_iova_to_phys() will fail.  The
        page thus would not be unpinned at all.
      Reported-by: default avatarJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e35b1e9f
    • Paolo Bonzini's avatar
      Revert "KVM: x86: Increase the number of fixed MTRR regs to 10" · 1cd31e1b
      Paolo Bonzini authored
      commit 0d234daf upstream.
      
      This reverts commit 682367c4,
      which causes 32-bit SMP Windows 7 guests to panic.
      
      SeaBIOS has a limit on the number of MTRRs that it can handle,
      and this patch exceeded the limit.  Better revert it.
      Thanks to Nadav Amit for debugging the cause.
      Reported-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1cd31e1b
    • Paolo Bonzini's avatar
      KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · 6c9866ee
      Paolo Bonzini authored
      commit 0f6c0a74 upstream.
      
      Currently, the EOI exit bitmap (used for APICv) does not include
      interrupts that are masked.  However, this can cause a bug that manifests
      as an interrupt storm inside the guest.  Alex Williamson reported the
      bug and is the one who really debugged this; I only wrote the patch. :)
      
      The scenario involves a multi-function PCI device with OHCI and EHCI
      USB functions and an audio function, all assigned to the guest, where
      both USB functions use legacy INTx interrupts.
      
      As soon as the guest boots, interrupts for these devices turn into an
      interrupt storm in the guest; the host does not see the interrupt storm.
      Basically the EOI path does not work, and the guest continues to see the
      interrupt over and over, even after it attempts to mask it at the APIC.
      The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
      with not many changes in the area of APIC/IOAPIC handling).
      
      Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
      on in the eoi_exit_bitmap and TMR, and things then work.  What happens
      is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
      It does not have set bit 59 because the RTE was masked, so the IOAPIC
      never sees the EOI and the interrupt continues to fire in the guest.
      
      My guess was that the guest is masking the interrupt in the redirection
      table in the interrupt routine, i.e. while the interrupt is set in a
      LAPIC's ISR, The simplest fix is to ignore the masking state, we would
      rather have an unnecessary exit rather than a missed IRQ ACK and anyway
      IOAPIC interrupts are not as performance-sensitive as for example MSIs.
      Alex tested this patch and it fixed his bug.
      
      [Thanks to Alex for his precise description of the problem
       and initial debugging effort.  A lot of the text above is
       based on emails exchanged with him.]
      Reported-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6c9866ee
    • Nadav Amit's avatar
      KVM: x86: Inter-privilege level ret emulation is not implemeneted · 06508c9b
      Nadav Amit authored
      commit 9e8919ae upstream.
      
      Return unhandlable error on inter-privilege level ret instruction.  This is
      since the current emulation does not check the privilege level correctly when
      loading the CS, and does not pop RSP/SS as needed.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      06508c9b
    • Steven Rostedt's avatar
      debugfs: Fix corrupted loop in debugfs_remove_recursive · 3501de9d
      Steven Rostedt authored
      commit 485d4402 upstream.
      
      [ I'm currently running my tests on it now, and so far, after a few
       hours it has yet to blow up. I'll run it for 24 hours which it never
       succeeded in the past. ]
      
      The tracing code has a way to make directories within the debugfs file
      system as well as deleting them using mkdir/rmdir in the instance
      directory. This is very limited in functionality, such as there is
      no renames, and the parent directory "instance" can not be modified.
      The tracing code creates the instance directory from the debugfs code
      and then replaces the dentry->d_inode->i_op with its own to allow
      for mkdir/rmdir to work.
      
      When these are called, the d_entry and inode locks need to be released
      to call the instance creation and deletion code. That code has its own
      accounting and locking to serialize everything to prevent multiple
      users from causing harm. As the parent "instance" directory can not
      be modified this simplifies things.
      
      I created a stress test that creates several threads that randomly
      creates and deletes directories thousands of times a second. The code
      stood up to this test and I submitted it a while ago.
      
      Recently I added a new test that adds readers to the mix. While the
      instance directories were being added and deleted, readers would read
      from these directories and even enable tracing within them. This test
      was able to trigger a bug:
      
       general protection fault: 0000 [#1] PREEMPT SMP
       Modules linked in: ...
       CPU: 3 PID: 17789 Comm: rmdir Tainted: G        W     3.15.0-rc2-test+ #41
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
       task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000
       RIP: 0010:[<ffffffff811ed5eb>]  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
       RSP: 0018:ffff880077019df8  EFLAGS: 00010246
       RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000
       RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454
       RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000
       R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640
       R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7
       FS:  00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0
       Stack:
        ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640
        0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000
        00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28
       Call Trace:
        [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de
        [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3
        [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139
        [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b
       Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08
       RIP  [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367
        RSP <ffff880077019df8>
      
      It took a while, but every time it triggered, it was always in the
      same place:
      
      	list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) {
      
      Where the child->d_u.d_child seemed to be corrupted.  I added lots of
      trace_printk()s to see what was wrong, and sure enough, it was always
      the child's d_u.d_child field. I looked around to see what touches
      it and noticed that in __dentry_kill() which calls dentry_free():
      
      static void dentry_free(struct dentry *dentry)
      {
      	/* if dentry was never visible to RCU, immediate free is OK */
      	if (!(dentry->d_flags & DCACHE_RCUACCESS))
      		__d_free(&dentry->d_u.d_rcu);
      	else
      		call_rcu(&dentry->d_u.d_rcu, __d_free);
      }
      
      I also noticed that __dentry_kill() unlinks the child->d_u.child
      under the parent->d_lock spin_lock.
      
      Looking back at the loop in debugfs_remove_recursive() it never takes the
      parent->d_lock to do the list walk. Adding more tracing, I was able to
      prove this was the issue:
      
       ftrace-t-15385   1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600
          rmdir-15409   2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058
      
      The dentry_kill freed ffff88006d573600 just as the remove recursive was walking
      it.
      
      In order to fix this, the list walk needs to be modified a bit to take
      the parent->d_lock. The safe version is no longer necessary, as every
      time we remove a child, the parent->d_lock must be released and the
      list walk must start over. Each time a child is removed, even though it
      may still be on the list, it should be skipped by the first check
      in the loop:
      
      		if (!debugfs_positive(child))
      			continue;
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3501de9d
    • Arnd Bergmann's avatar
      crypto: ux500 - make interrupt mode plausible · 9cf85b64
      Arnd Bergmann authored
      commit e1f8859e upstream.
      
      The interrupt handler in the ux500 crypto driver has an obviously
      incorrect way to access the data buffer, which for a while has
      caused this build warning:
      
      ../ux500/cryp/cryp_core.c: In function 'cryp_interrupt_handler':
      ../ux500/cryp/cryp_core.c:234:5: warning: passing argument 1 of '__fswab32' makes integer from pointer without a cast [enabled by default]
           writel_relaxed(ctx->indata,
           ^
      In file included from ../include/linux/swab.h:4:0,
                       from ../include/uapi/linux/byteorder/big_endian.h:12,
                       from ../include/linux/byteorder/big_endian.h:4,
                       from ../arch/arm/include/uapi/asm/byteorder.h:19,
                       from ../include/asm-generic/bitops/le.h:5,
                       from ../arch/arm/include/asm/bitops.h:340,
                       from ../include/linux/bitops.h:33,
                       from ../include/linux/kernel.h:10,
                       from ../include/linux/clk.h:16,
                       from ../drivers/crypto/ux500/cryp/cryp_core.c:12:
      ../include/uapi/linux/swab.h:57:119: note: expected '__u32' but argument is of type 'const u8 *'
       static inline __attribute_const__ __u32 __fswab32(__u32 val)
      
      There are at least two, possibly three problems here:
      a) when writing into the FIFO, we copy the pointer rather than the
         actual data we want to give to the hardware
      b) the data pointer is an array of 8-bit values, while the FIFO
         is 32-bit wide, so both the read and write access fail to do
         a proper type conversion
      c) This seems incorrect for big-endian kernels, on which we need to
         byte-swap any register access, but not normally FIFO accesses,
         at least the DMA case doesn't do it either.
      
      This converts the bogus loop to use the same readsl/writesl pair
      that we use for the two other modes (DMA and polling). This is
      more efficient and consistent, and probably correct for endianess.
      
      The bug has existed since the driver was first merged, and was
      probably never detected because nobody tried to use interrupt mode.
      It might make sense to backport this fix to stable kernels, depending
      on how the crypto maintainers feel about that.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: linux-crypto@vger.kernel.org
      Cc: Fabio Baltieri <fabio.baltieri@linaro.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9cf85b64
    • Peter Hurley's avatar
      serial: core: Preserve termios c_cflag for console resume · c2a16794
      Peter Hurley authored
      commit ae84db96 upstream.
      
      When a tty is opened for the serial console, the termios c_cflag
      settings are inherited from the console line settings.
      However, if the tty is subsequently closed, the termios settings
      are lost. This results in a garbled console if the console is later
      suspended and resumed.
      
      Preserve the termios c_cflag for the serial console when the tty
      is shutdown; this reflects the most recent line settings.
      
      Fixes: Bugzilla #69751, 'serial console does not wake from S3'
      Reported-by: default avatarValerio Vanni <valerio.vanni@inwind.it>
      Acked-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c2a16794
    • Theodore Ts'o's avatar
      ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct · 7b641785
      Theodore Ts'o authored
      commit 86f0afd4 upstream.
      
      If there is a failure while allocating the preallocation structure, a
      number of blocks can end up getting marked in the in-memory buddy
      bitmap, and then not getting released.  This can result in the
      following corruption getting reported by the kernel:
      
      EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
      12793 clusters in bitmap, 12729 in gd
      
      In that case, we need to release the blocks using mb_free_blocks().
      
      Tested: fs smoke test; also demonstrated that with injected errors,
      	the file system is no longer getting corrupted
      
      Google-Bug-Id: 16657874
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7b641785
    • Wolfram Sang's avatar
      drivers/i2c/busses: use correct type for dma_map/unmap · bbe7db75
      Wolfram Sang authored
      commit 28772ac8 upstream.
      
      dma_{un}map_* uses 'enum dma_data_direction' not 'enum dma_transfer_direction'.
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bbe7db75
    • Jason Gunthorpe's avatar
      tpm: Add missing tpm_do_selftest to ST33 I2C driver · f735c9db
      Jason Gunthorpe authored
      commit f07a5e9a upstream.
      
      Most device drivers do call 'tpm_do_selftest' which executes a
      TPM_ContinueSelfTest. tpm_i2c_stm_st33 is just pointlessly different,
      I think it is bug.
      
      These days we have the general assumption that the TPM is usable by
      the kernel immediately after the driver is finished, so we can no
      longer defer the mandatory self test to userspace.
      Reported-by: default avatarRichard Marciel <rmaciel@linux.vnet.ibm.com>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f735c9db
    • Axel Lin's avatar
      hwmon: (dme1737) Prevent overflow problem when writing large limits · f5962d66
      Axel Lin authored
      commit d58e47d7 upstream.
      
      On platforms with sizeof(int) < sizeof(long), writing a temperature
      limit larger than MAXINT will result in unpredictable limit values
      written to the chip. Avoid auto-conversion from long to int to fix
      the problem.
      
      Voltage limits, fan minimum speed, pwm frequency, pwm ramp rate, and
      other attributes have the same problem, fix them as well.
      
      Zone temperature limits are signed, but were cached as u8, causing
      unepected values to be reported for negative temperatures. Cache as
      s8 to fix the problem.
      
      vrm is an u8, so the written value needs to be limited to [0, 255].
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      [Guenter Roeck: Fix zone temperature cache]
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f5962d66
    • Axel Lin's avatar
      hwmon: (ads1015) Fix out-of-bounds array access · 64e6fad5
      Axel Lin authored
      commit e9814295 upstream.
      
      Current code uses data_rate as array index in ads1015_read_adc() and uses pga
      as array index in ads1015_reg_to_mv, so we must make sure both data_rate and
      pga settings are in valid value range.
      Return -EINVAL if the setting is out-of-range.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      64e6fad5
    • Guenter Roeck's avatar
      hwmon: (lm85) Fix various errors on attribute writes · cd072b3e
      Guenter Roeck authored
      commit 3248c3b7 upstream.
      
      Temperature limit register writes did not account for negative numbers.
      As a result, writing -127000 resulted in -126000 written into the
      temperature limit register. This problem affected temp[1-3]_min,
      temp[1-3]_max, temp[1-3]_auto_temp_crit, and temp[1-3]_auto_temp_min.
      
      When writing pwm[1-3]_freq, a long variable was auto-converted into an int
      without range check. Wiring values larger than MAXINT resulted in unexpected
      register values.
      
      When writing temp[1-3]_auto_temp_max, an unsigned long variable was
      auto-converted into an int without range check. Writing values larger than
      MAXINT resulted in unexpected register values.
      
      vrm is an u8, so the written value needs to be limited to [0, 255].
      
      Cc: Axel Lin <axel.lin@ingics.com>
      Reviewed-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cd072b3e
    • Axel Lin's avatar
      hwmon: (ads1015) Fix off-by-one for valid channel index checking · 497bba1c
      Axel Lin authored
      commit 56de1377 upstream.
      
      Current code uses channel as array index, so the valid channel value is
      0 .. ADS1015_CHANNELS - 1.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      497bba1c
    • Axel Lin's avatar
      hwmon: (gpio-fan) Prevent overflow problem when writing large limits · b800e2f8
      Axel Lin authored
      commit 2565fb05 upstream.
      
      On platforms with sizeof(int) < sizeof(unsigned long), writing a rpm value
      larger than MAXINT will result in unpredictable limit values written to the
      chip. Avoid auto-conversion from unsigned long to int to fix the problem.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b800e2f8
    • Guenter Roeck's avatar
      hwmon: (lm78) Fix overflow problems seen when writing large temperature limits · 80dc972c
      Guenter Roeck authored
      commit 1074d683 upstream.
      
      On platforms with sizeof(int) < sizeof(long), writing a temperature
      limit larger than MAXINT will result in unpredictable limit values
      written to the chip. Avoid auto-conversion from long to int to fix
      the problem.
      
      Cc: Axel Lin <axel.lin@ingics.com>
      Reviewed-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      80dc972c
    • Axel Lin's avatar
      hwmon: (amc6821) Fix possible race condition bug · 801d26bb
      Axel Lin authored
      commit cf44819c upstream.
      
      Ensure mutex lock protects the read-modify-write period to prevent possible
      race condition bug.
      In additional, update data->valid should also be protected by the mutex lock.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      801d26bb