1. 08 May, 2013 40 commits
    • Lars-Peter Clausen's avatar
      mfd: adp5520: Restore mode bits on resume · d82f013b
      Lars-Peter Clausen authored
      commit c6cc25fd upstream.
      
      The adp5520 unfortunately also clears the BL_EN bit when the nSTNDBY bit is
      cleared. So we need to make sure to restore it during resume if it was set
      before suspend.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarSamuel Ortiz <sameo@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d82f013b
    • Terry Barnaby's avatar
      mmc: atmel-mci: pio hang on block errors · 37f84175
      Terry Barnaby authored
      commit bdbc5d0c upstream.
      
      The driver is doing, by default, multi-block reads. When a block error
      occurs, card/block.c instigates a single block read: "mmcblk0: retrying
      using single block read".  It leaves the sg chain intact and just changes
      the length attribute for the first sg entry and the overall sg_len
      parameter.  When atmci_read_data_pio is called to read the single block
      of data it ignores the sg_len and expects to read more than 512 bytes as
      it sees there are multiple items in the sg list. No more data comes as
      the controller has only been commanded to get one block.
      Signed-off-by: default avatarTerry Barnaby <terry@beam.ltd.uk>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37f84175
    • Philip Rakity's avatar
      mmc: core: Fix bit width test failing on old eMMC cards · 217fd3af
      Philip Rakity authored
      commit 836dc2fe upstream.
      
      PARTITION_SUPPORT needs to be set before doing the compare on version
      number so the bit width test does not get invalid data.  Before this
      patch, a Sandisk iNAND eMMC card would detect 1-bit width although
      the hardware supports 4-bit.
      
      Only affects old emmc devices - pre 4.4 devices.
      Reported-by: default avatarElad Yi <elad.yi@gmail.com>
      Signed-off-by: default avatarPhilip Rakity <prakity@yahoo.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      217fd3af
    • Li Fei's avatar
      x86: Eliminate irq_mis_count counted in arch_irq_stat · 4a70589f
      Li Fei authored
      commit f7b0e105 upstream.
      
      With the current implementation, kstat_cpu(cpu).irqs_sum is also
      increased in case of irq_mis_count increment.
      
      So there is no need to count irq_mis_count in arch_irq_stat,
      otherwise irq_mis_count will be counted twice in the sum of
      /proc/stat.
      Reported-by: default avatarLiu Chuansheng <chuansheng.liu@intel.com>
      Signed-off-by: default avatarLi Fei <fei.li@intel.com>
      Acked-by: default avatarLiu Chuansheng <chuansheng.liu@intel.com>
      Cc: tomoki.sekiyama.qu@hitachi.com
      Cc: joe@perches.com
      Link: http://lkml.kernel.org/r/1366980611.32469.7.camel@fli24-HP-Compaq-8100-Elite-CMT-PCSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a70589f
    • Gleb Natapov's avatar
      KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions · 5b5b3058
      Gleb Natapov authored
      commit 660696d1 upstream.
      
      Source operand for one byte mov[zs]x is decoded incorrectly if it is in
      high byte register. Fix that.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b5b3058
    • Johan Hovold's avatar
      mmc: at91/avr32/atmel-mci: fix DMA-channel leak on module unload · bb878b30
      Johan Hovold authored
      commit 91cf54fe upstream.
      
      Fix regression introduced by commit 796211b7 ("mmc: atmel-mci: add
      pdc support and runtime capabilities detection") which removed the need
      for CONFIG_MMC_ATMELMCI_DMA but kept the Kconfig-entry as well as the
      compile guards around dma_release_channel() in remove(). Consequently,
      DMA is always enabled (if supported), but the DMA-channel is not
      released on module unload unless the DMA-config option is selected.
      
      Remove the no longer used CONFIG_MMC_ATMELMCI_DMA option completely.
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb878b30
    • Theodore Ts'o's avatar
      ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG · 165628d1
      Theodore Ts'o authored
      commit 7f3e3c7c upstream.
      
      Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
      change made by commit a0b30c12: ext4: use module parameters instead
      of debugfs for mballoc_debug
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      165628d1
    • Theodore Ts'o's avatar
      ext4: fix online resizing for ext3-compat file systems · 7e30abf7
      Theodore Ts'o authored
      commit c5c72d81 upstream.
      
      Commit fb0a387d restricts block allocations for indirect-mapped
      files to block groups less than s_blockfile_groups.  However, the
      online resizing code wasn't setting s_blockfile_groups, so the newly
      added block groups were not available for non-extent mapped files.
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e30abf7
    • Dmitry Monakhov's avatar
      ext4: fix journal callback list traversal · f5b36426
      Dmitry Monakhov authored
      commit 5d3ee208 upstream.
      
      It is incorrect to use list_for_each_entry_safe() for journal callback
      traversial because ->next may be removed by other task:
      ->ext4_mb_free_metadata()
        ->ext4_mb_free_metadata()
          ->ext4_journal_callback_del()
      
      This results in the following issue:
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      This patch fix the issue as follows:
      - ext4_journal_commit_callback() make list truly traversial safe
        simply by always starting from list_head
      - fix race between two ext4_journal_callback_del() and
        ext4_journal_callback_try_del()
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5b36426
    • Dmitry Monakhov's avatar
      jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback · 213116e5
      Dmitry Monakhov authored
      commit 794446c6 upstream.
      
      The following race is possible:
      
      [kjournald2]                              other_task
      jbd2_journal_commit_transaction()
        j_state = T_FINISHED;
        spin_unlock(&journal->j_list_lock);
                                               ->jbd2_journal_remove_checkpoint()
      					   ->jbd2_journal_free_transaction();
      					     ->kmem_cache_free(transaction)
        ->j_commit_callback(journal, transaction);
          -> USE_AFTER_FREE
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      In order to demonstrace this issue one should mount ext4 with mount -o
      discard option on SSD disk.  This makes callback longer and race
      window becomes wider.
      
      In order to fix this we should mark transaction as finished only after
      callbacks have completed
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      213116e5
    • Jacob Keller's avatar
      ixgbe: fix EICR write in ixgbe_msix_other · e99e7562
      Jacob Keller authored
      commit d87d8307 upstream.
      
      Previously, the ixgbe_msix_other was writing the full 32bits of the set
      interrupts, instead of only the ones which the ixgbe_msix_other is
      handling. This resulted in a loss of performance when the X540's PPS feature is
      enabled due to sometimes clearing queue interrupts which resulted in the driver
      not getting the interrupt for cleaning the q_vector rings often enough. The fix
      is to simply mask the lower 16bits off so that this handler does not write them
      in the EICR, which causes them to remain high and be properly handled by the
      clean_rings interrupt routine as normal.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e99e7562
    • Robin Holt's avatar
      ipc: sysv shared memory limited to 8TiB · b7d885f2
      Robin Holt authored
      commit d69f3bad upstream.
      
      Trying to run an application which was trying to put data into half of
      memory using shmget(), we found that having a shmall value below 8EiB-8TiB
      would prevent us from using anything more than 8TiB.  By setting
      kernel.shmall greater than 8EiB-8TiB would make the job work.
      
      In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
      
      ipc/shm.c:
       458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
       459 {
      ...
       465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
      ...
       474         if (ns->shm_tot + numpages > ns->shm_ctlall)
       475                 return -ENOSPC;
      
      [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Reported-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7d885f2
    • Johannes Berg's avatar
      wireless: regulatory: fix channel disabling race condition · 131e3afd
      Johannes Berg authored
      commit 990de49f upstream.
      
      When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz
      part of the scan disables a 5.2 GHz channel due to, e.g. receiving
      country or frequency information, that 5.2 GHz channel might already
      be in the list of channels to scan next. Then, when the driver checks
      if it should do a passive scan, that will return false and attempt an
      active scan. This is not only wrong but can also lead to the iwlwifi
      device firmware crashing since it checks regulatory as well.
      
      Fix this by not setting the channel flags to just disabled but rather
      OR'ing in the disabled flag. That way, even if the race happens, the
      channel will be scanned passively which is still (mostly) correct.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      131e3afd
    • Bryan Schumaker's avatar
      nfsd: Decode and send 64bit time values · 2111a770
      Bryan Schumaker authored
      commit bf8d9097 upstream.
      
      The seconds field of an nfstime4 structure is 64bit, but we are assuming
      that the first 32bits are zero-filled.  So if the client tries to set
      atime to a value before the epoch (touch -t 196001010101), then the
      server will save the wrong value on disk.
      Signed-off-by: default avatarBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2111a770
    • J. Bruce Fields's avatar
      nfsd4: don't close read-write opens too soon · f71ce17f
      J. Bruce Fields authored
      commit 0c7c3e67 upstream.
      
      Don't actually close any opens until we don't need them at all.
      
      This means being left with write access when it's not really necessary,
      but that's better than putting a file that might still have posix locks
      held on it, as we have been.
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f71ce17f
    • Trond Myklebust's avatar
      NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall · ed9a34c5
      Trond Myklebust authored
      commit 8b6cc4d6 upstream.
      
      A server shouldn't normally return NFS4ERR_GRACE if the client holds a
      delegation, since no conflicting lock reclaims can be granted, however
      the spec does not require the server to grant the open in this
      instance
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9a34c5
    • NeilBrown's avatar
      md: bad block list should default to disabled. · 6cd670f0
      NeilBrown authored
      commit 486adf72 upstream.
      
      Maintenance of a bad-block-list currently defaults to 'enabled'
      and is then disabled when it cannot be supported.
      This is backwards and causes problem for dm-raid which didn't know
      to disable it.
      
      So fix the defaults, and only enabled for v1.x metadata which
      explicitly has bad blocks enabled.
      
      The problem with dm-raid has been present since badblock support was
      added in v3.1, so this patch is suitable for any -stable from 3.1
      onwards.
      Reported-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cd670f0
    • Trond Myklebust's avatar
      LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot · 02d1a16d
      Trond Myklebust authored
      commit 1dfd89af upstream.
      
      After a server reboot, the reclaimer thread will recover all the existing
      locks. For locks that are blocked, however, it will change the value
      of block->b_status to nlm_lck_denied_grace_period in order to signal that
      they need to wake up and resend the original blocking lock request.
      
      Due to a bug, however, the block->b_status never gets reset after the
      blocked locks have been woken up, and so the process goes into an
      infinite loop of resends until the blocked lock is satisfied.
      Reported-by: default avatarMarc Eshel <eshel@us.ibm.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02d1a16d
    • Greg Thelen's avatar
      fs/dcache.c: add cond_resched() to shrink_dcache_parent() · b51c8db5
      Greg Thelen authored
      commit 421348f1 upstream.
      
      Call cond_resched() in shrink_dcache_parent() to maintain interactivity.
      
      Before this patch:
      
      	void shrink_dcache_parent(struct dentry * parent)
      	{
      		while ((found = select_parent(parent, &dispose)) != 0)
      			shrink_dentry_list(&dispose);
      	}
      
      select_parent() populates the dispose list with dentries which
      shrink_dentry_list() then deletes.  select_parent() carefully uses
      need_resched() to avoid doing too much work at once.  But neither
      shrink_dcache_parent() nor its called functions call cond_resched().  So
      once need_resched() is set select_parent() will return single dentry
      dispose list which is then deleted by shrink_dentry_list().  This is
      inefficient when there are a lot of dentry to process.  This can cause
      softlockup and hurts interactivity on non preemptable kernels.
      
      This change adds cond_resched() in shrink_dcache_parent().  The benefit
      of this is that need_resched() is quickly cleared so that future calls
      to select_parent() are able to efficiently return a big batch of dentry.
      
      These additional cond_resched() do not seem to impact performance, at
      least for the workload below.
      
      Here is a program which can cause soft lockup if other system activity
      sets need_resched().
      
      	int main()
      	{
      	        struct rlimit rlim;
      	        int i;
      	        int f[100000];
      	        char buf[20];
      	        struct timeval t1, t2;
      	        double diff;
      
      	        /* cleanup past run */
      	        system("rm -rf x");
      
      	        /* boost nfile rlimit */
      	        rlim.rlim_cur = 200000;
      	        rlim.rlim_max = 200000;
      	        if (setrlimit(RLIMIT_NOFILE, &rlim))
      	                err(1, "setrlimit");
      
      	        /* make directory for files */
      	        if (mkdir("x", 0700))
      	                err(1, "mkdir");
      
      	        if (gettimeofday(&t1, NULL))
      	                err(1, "gettimeofday");
      
      	        /* populate directory with open files */
      	        for (i = 0; i < 100000; i++) {
      	                snprintf(buf, sizeof(buf), "x/%d", i);
      	                f[i] = open(buf, O_CREAT);
      	                if (f[i] == -1)
      	                        err(1, "open");
      	        }
      
      	        /* close some of the files */
      	        for (i = 0; i < 85000; i++)
      	                close(f[i]);
      
      	        /* unlink all files, even open ones */
      	        system("rm -rf x");
      
      	        if (gettimeofday(&t2, NULL))
      	                err(1, "gettimeofday");
      
      	        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
      	                ((double)t1.tv_sec * 1000000 + t1.tv_usec));
      
      	        printf("done: %g elapsed\n", diff/1e6);
      	        return 0;
      	}
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b51c8db5
    • Thomas Gleixner's avatar
      clockevents: Set dummy handler on CPU_DEAD shutdown · 357093a8
      Thomas Gleixner authored
      commit 6f7a05d7 upstream.
      
      Vitaliy reported that a per cpu HPET timer interrupt crashes the
      system during hibernation. What happens is that the per cpu HPET timer
      gets shut down when the nonboot cpus are stopped. When the nonboot
      cpus are onlined again the HPET code sets up the MSI interrupt which
      fires before the clock event device is registered. The event handler
      is still set to hrtimer_interrupt, which then crashes the machine due
      to highres mode not being active.
      
      See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333
      
      There is no real good way to avoid that in the HPET code. The HPET
      code alrady has a mechanism to detect spurious interrupts when event
      handler == NULL for a similar reason.
      
      We can handle that in the clockevent/tick layer and replace the
      previous functional handler with a dummy handler like we do in
      tick_setup_new_device().
      
      The original clockevents code did this in clockevents_exchange_device(),
      but that got removed by commit 7c1e7689 (clockevents: prevent
      clockevent event_handler ending up handler_noop) which forgot to fix
      it up in tick_shutdown(). Same issue with the broadcast device.
      Reported-by: default avatarVitaliy Fillipov <vitalif@yourcmc.ru>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: 700333@bugs.debian.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      357093a8
    • Li Zefan's avatar
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · 97630ecd
      Li Zefan authored
      commit 3ac1707a upstream.
      
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97630ecd
    • Derek Basehore's avatar
      drivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend · b593b4dc
      Derek Basehore authored
      commit e005715e upstream.
      
      There's a bug where rtc alarms are ignored after the rtc cmos suspends
      but before the system finishes suspend.  Since hpet emulation is
      disabled and it still handles the interrupts, a wake event is never
      registered which is done from the rtc layer.
      
      This patch reverts commit d1b2efa8 ("rtc: disable hpet emulation on
      suspend") which disabled hpet emulation.  To fix the problem mentioned
      in that commit, hpet_rtc_timer_init() is called directly on resume.
      Signed-off-by: default avatarDerek Basehore <dbasehore@chromium.org>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b593b4dc
    • Prarit Bhargava's avatar
      hrtimer: Add expiry time overflow check in hrtimer_interrupt · 194d30b3
      Prarit Bhargava authored
      commit 8f294b5a upstream.
      
      The settimeofday01 test in the LTP testsuite effectively does
      
              gettimeofday(current time);
              settimeofday(Jan 1, 1970 + 100 seconds);
              settimeofday(current time);
      
      This test causes a stack trace to be displayed on the console during the
      setting of timeofday to Jan 1, 1970 + 100 seconds:
      
      [  131.066751] ------------[ cut here ]------------
      [  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
      [  131.104935] Hardware name: Dinar
      [  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
      [  131.182248] Call Trace:
      [  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
      [  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
      [  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
      [  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
      [  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
      [  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
      [  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
      [  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
      [  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
      [  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
      [  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
      [  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
      [  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
      [  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
      [  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
      [  131.274449] ---[ end trace 1151a50552231615 ]---
      
      When we change the system time to a low value like this, the value of
      timekeeper->offs_real will be a negative value.
      
      It seems that the WARN occurs because an hrtimer has been started in the time
      between the releasing of the timekeeper lock and the IPI call (via a call to
      on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
      is that a REALTIME_CLOCK timer has been added with softexpires = expires =
      KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
      kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
      clock base's offset (which was set to timekeeper->offs_real in
      do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
      was KTIME_MAX):
      
      	KTIME_MAX - (a negative value) = overflow
      
      A simple check for an overflow can resolve this problem.  Using KTIME_MAX
      instead of the overflow value will result in the hrtimer function being run,
      and the reprogramming of the timer after that.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweaked commit subject]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      194d30b3
    • David Engraf's avatar
      hrtimer: Fix ktime_add_ns() overflow on 32bit architectures · 86a640fa
      David Engraf authored
      commit 51fd36f3 upstream.
      
      One can trigger an overflow when using ktime_add_ns() on a 32bit
      architecture not supporting CONFIG_KTIME_SCALAR.
      
      When passing a very high value for u64 nsec, e.g. 7881299347898368000
      the do_div() function converts this value to seconds (7881299347) which
      is still to high to pass to the ktime_set() function as long. The result
      in is a negative value.
      
      The problem on my system occurs in the tick-sched.c,
      tick_nohz_stop_sched_tick() when time_delta is set to
      timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
      valid, thus ktime_add_ns() is called with a too large value resulting in
      a negative expire value. This leads to an endless loop in the ticker code:
      
      time_delta: 7881299347898368000
      expires = ktime_add_ns(last_update, time_delta)
      expires: negative value
      
      This fix caps the value to KTIME_MAX.
      
      This error doesn't occurs on 64bit or architectures supporting
      CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
      Signed-off-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      [jstultz: Minor tweaks to commit message & header]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86a640fa
    • Dylan Reid's avatar
      ASoC: max98088: Fix logging of hardware revision. · a91a9f1e
      Dylan Reid authored
      commit 98682063 upstream.
      
      The hardware revision of the codec is based at 0x40.  Subtract that
      before convering to ASCII.  The same as it is done for 98095.
      Signed-off-by: default avatarDylan Reid <dgreid@chromium.org>
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a91a9f1e
    • Takashi Iwai's avatar
      ALSA: usb-audio: Fix autopm error during probing · 8a70ddb0
      Takashi Iwai authored
      commit 60af3d03 upstream.
      
      We've got strange errors in get_ctl_value() in mixer.c during
      probing, e.g. on Hercules RMX2 DJ Controller:
      
        ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x201, wIndex = 0xa00, type = 4
        ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x200, wIndex = 0xa00, type = 4
        ....
      
      It turned out that the culprit is autopm: snd_usb_autoresume() returns
      -ENODEV when called during card->probing = 1.
      
      Since the call itself during card->probing = 1 is valid, let's fix the
      return value of snd_usb_autoresume() as success.
      Reported-and-tested-by: default avatarDaniel Schürmann <daschuer@mixxx.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a70ddb0
    • Clemens Ladisch's avatar
      ALSA: usb-audio: disable autopm for MIDI devices · b01ae289
      Clemens Ladisch authored
      commit cbc200bc upstream.
      
      Commit 88a8516a (ALSA: usbaudio: implement USB autosuspend)
      introduced autopm for all USB audio/MIDI devices.  However, many MIDI
      devices, such as synthesizers, do not merely transmit MIDI messages but
      use their MIDI inputs to control other functions.  With autopm, these
      devices would get powered down as soon as the last MIDI port device is
      closed on the host.
      
      Even some plain MIDI interfaces could get broken: they automatically
      send Active Sensing messages while powered up, but as soon as these
      messages cease, the receiving device would interpret this as an
      accidental disconnection.
      
      Commit f5f16541 (ALSA: usb-audio: Fix missing autopm for MIDI input)
      introduced another regression: some devices (e.g. the Roland GAIA SH-01)
      are self-powered but do a reset whenever the USB interface's power state
      changes.
      
      To work around all this, just disable autopm for all USB MIDI devices.
      
      Reported-by: Laurens Holst
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b01ae289
    • Daniel Mack's avatar
      ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT · 919fa1da
      Daniel Mack authored
      commit ebfc594c upstream.
      
      The USB_DT_CS_ENDPOINT class-specific endpoint descriptor is usually
      stuffed directly after the standard USB endpoint descriptor, and this is
      where the driver currently expects it to be.
      
      There are, however, devices in the wild that have it the other way
      around in their descriptor sets, so the USB_DT_CS_ENDPOINT comes
      *before* the standard enpoint. Devices known to implement it that way
      are "Sennheiser BTD-500" and Plantronics USB headsets.
      
      When the driver can't find the USB_DT_CS_ENDPOINT, it won't be able to
      change sample rates, as the bitmask for the validity of this command is
      storen in bmAttributes of that descriptor.
      
      Fix this by searching the entire interface instead of just the extra
      bytes of the first endpoint, in case the latter fails.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-and-tested-by: default avatarTorstein Hegge <hegge@resisty.net>
      Reported-and-tested-by: default avatarYves G <alsa-user@vivigatt.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      919fa1da
    • Hugh Dickins's avatar
      mm: allow arch code to control the user page table ceiling · 273a82be
      Hugh Dickins authored
      commit 6ee8630e upstream.
      
      On architectures where a pgd entry may be shared between user and kernel
      (e.g.  ARM+LPAE), freeing page tables needs a ceiling other than 0.
      This patch introduces a generic USER_PGTABLES_CEILING that arch code can
      override.  It is the responsibility of the arch code setting the ceiling
      to ensure the complete freeing of the page tables (usually in
      pgd_free()).
      
      [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      273a82be
    • Anurup m's avatar
      fs/fscache/stats.c: fix memory leak · ede49f36
      Anurup m authored
      commit ec686c92 upstream.
      
      There is a kernel memory leak observed when the proc file
      /proc/fs/fscache/stats is read.
      
      The reason is that in fscache_stats_open, single_open is called and the
      respective release function is not called during release.  Hence fix
      with correct release function - single_release().
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101Signed-off-by: default avatarAnurup m <anurup.m@huawei.com>
      Cc: shyju pv <shyju.pv@huawei.com>
      Cc: Sanil kumar <sanil.kumar@huawei.com>
      Cc: Nataraj m <nataraj.m@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ede49f36
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the kvm implementation · f9a0a8cd
      Stephan Schreiber authored
      commit de53e9ca upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/kvm/vtlb.c.
      
      I observed this on Kernel 3.2.35 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/kvm/vtlb.c:
      
      u64 guest_vhpt_lookup(u64 iha, u64 *pte)
      {
      	u64 ret;
      	struct thash_data *data;
      
      	data = __vtr_lookup(current_vcpu, iha, D_TLB);
      	if (data != NULL)
      		thash_vhpt_insert(current_vcpu, data->page_flags,
      			data->itir, iha, D_TLB);
      
      	asm volatile (
      			"rsm psr.ic|psr.i;;"
      			"srlz.d;;"
      			"ld8.s r9=[%1];;"
      			"tnat.nz p6,p7=r9;;"
      			"(p6) mov %0=1;"
      			"(p6) mov r9=r0;"
      			"(p7) extr.u r9=r9,0,53;;"
      			"(p7) mov %0=r0;"
      			"(p7) st8 [%2]=r9;;"
      			"ssm psr.ic;;"
      			"srlz.d;;"
      			"ssm psr.i;;"
      			"srlz.d;;"
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      
      	return ret;
      }
      
      The list of output registers is
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are iha, pte on the example.
      If the predicate p7 is true, the 8th assembly instruction
      			"(p7) mov %0=r0;"
      is the first one which writes to a register which is maintained by the
      register constraints; it sets %0. %0 means the first register operand;
      it is ret here.
      This instruction might overwrite the %2 register (pte) which is needed
      by the next instruction:
      			"(p7) st8 [%2]=r9;;"
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The attached patch  fixes the register operand constraints in
      arch/ia64/kvm/vtlb.c.
      The register constraints should be
      			: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      This is Debian bug#702639
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9a0a8cd
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the futex implementation · 4f67f6d6
      Stephan Schreiber authored
      commit 136f39dd upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/include/asm/futex.h.
      
      I observed this on Kernel 3.2.23 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/include/asm/futex.h:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8");
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov %0=r0				\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "=r" (r8), "=r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      The list of output registers is
      			: "=r" (r8), "=r" (prev)
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are uaddr, newval, oldval on the
      example.
      The second assembly instruction
      			"	mov %0=r0				\n"
      is the first one which writes to a register; it sets %0 to 0. %0 means
      the first register operand; it is r8 here. (The r0 is read-only and
      always 0 on the Itanium; it can be used if an immediate zero value is
      needed.)
      This instruction might overwrite one of the other registers which are
      still needed.
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The objdump utility can give us disassembly.
      The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
      look for a module that uses the funtion. This is the
      cmpxchg_futex_value_locked() function in
      kernel/futex.c:
      
      static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
      				      u32 uval, u32 newval)
      {
      	int ret;
      
      	pagefault_disable();
      	ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
      	pagefault_enable();
      
      	return ret;
      }
      
      Now the disassembly. At first from the Kernel package 3.2.23 which has
      been compiled with GCC 4.4, remeber this Kernel seemed to work:
      objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	00 00 00 02 00 00 	            nop.m 0x0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 88 00 08 e0 	[MLX]       addp4 r8=r34,r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 84 40 90 11 	[MIB]       st4 [r32]=r33
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 58 20 1a 19 21 	[MMI]       adds r11=3208,r13;;
            2f6:	20 01 2c 20 20 00 	            ld4 r18=[r11]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 88 fc 25 3f 23 	[MMI]       adds r17=-1,r18;;
            306:	00 88 2c 20 23 00 	            st4 [r11]=r17
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      The lines
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
      are the instructions of the assembly block.
      The line
            2b6:	80 00 00 00 42 00 	            mov r8=r0
      sets the r8 register to 0 and after that
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
      is wrong.
      What happened here is what I explained above: An input register is
      overwritten which is still needed.
      The register operand constraints in futex.h are wrong.
      
      (The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
      
      The attached patch fixes the register operand constraints in futex.h.
      The code after patching of it:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8") = 0;
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "+r" (r8), "=&r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      I also initialized the 'r8' var with the C programming language.
      The _asm qualifier on the definition of the 'r8' var forces GCC to use
      the r8 processor register for it.
      I don't believe that we should use inline assembly for zeroing out a
      local variable.
      The constraint is
      "+r" (r8)
      what means that it is both an input register and an output register.
      Note that the page fault handler will modify the r8 register which
      will be the return value of the function.
      The real fix is
      "=&r" (prev)
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      Patched the Kernel 3.2.23 and compiled it with GCC4.4:
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	20 12 01 10 40 00 	            addp4 r34=r34,r0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0b 00 00 00 22 00 	[MMI]       mf;;
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
            2bc:	00 00 04 00       	            nop.i 0x0;;
            2c0:	09 58 8c 42 11 10 	[MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
            2c6:	00 00 00 02 00 00 	            nop.m 0x0
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 2c 40 90 11 	[MIB]       st4 [r32]=r11
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 88 20 1a 19 21 	[MMI]       adds r17=3208,r13;;
            2f6:	30 01 44 20 20 00 	            ld4 r19=[r17]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 90 fc 27 3f 23 	[MMI]       adds r18=-1,r19;;
            306:	00 90 44 20 23 00 	            st4 [r17]=r18
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      Much better.
      There is a
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
      which was generated by C code r8 = 0. Below
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
      what means that oldval is no longer overwritten.
      
      This is Debian bug#702641
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f67f6d6
    • Rafael J. Wysocki's avatar
      PCI/PM: Fix fallback to PCI_D0 in pci_platform_power_transition() · 076a5dc2
      Rafael J. Wysocki authored
      commit 769ba721 upstream.
      
      Commit b51306c6 (PCI: Set device power state to PCI_D0 for device
      without native PM support) modified pci_platform_power_transition()
      by adding code causing dev->current_state for devices that don't
      support native PCI PM but are power-manageable by the platform to be
      changed to PCI_D0 regardless of the value returned by the preceding
      platform_pci_set_power_state().  In particular, that also is done
      if the platform_pci_set_power_state() has been successful, which
      causes the correct power state of the device set by
      pci_update_current_state() in that case to be overwritten by PCI_D0.
      
      Fix that mistake by making the fallback to PCI_D0 only happen if
      the platform_pci_set_power_state() has returned an error.
      
      [bhelgaas: folded in Yinghai's simplification, added URL & stable info]
      Reference: http://lkml.kernel.org/r/27806FC4E5928A408B78E88BBC67A2306F466BBA@ORSMSX101.amr.corp.intel.comReported-by: default avatarChris J. Benenati <chris.j.benenati@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      076a5dc2
    • Yinghai Lu's avatar
      PCI / ACPI: Don't query OSC support with all possible controls · 7e99b12d
      Yinghai Lu authored
      commit 545d6e18 upstream.
      
      Found problem on system that firmware that could handle pci aer.
      Firmware get error reporting after pci injecting error, before os boots.
      But after os boots, firmware can not get report anymore, even pci=noaer
      is passed.
      
      Root cause: BIOS _OSC has problem with query bit checking.
      It turns out that BIOS vendor is copying example code from ACPI Spec.
      In ACPI Spec 5.0, page 290:
      
      	If (Not(And(CDW1,1))) // Query flag clear?
      	{	// Disable GPEs for features granted native control.
      		If (And(CTRL,0x01)) // Hot plug control granted?
      		{
      			Store(0,HPCE) // clear the hot plug SCI enable bit
      			Store(1,HPCS) // clear the hot plug SCI status bit
      		}
      	...
      	}
      
      When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
      So it will get into code path that should be for control set only.
      BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
      
      Current kernel code is using _OSC query to notify firmware about support
      from OS and then use _OSC to set control bits.
      During query support, current code is using all possible controls.
      So will execute code that should be only for control set stage.
      
      That will have problem when pci=noaer or aer firmware_first is used.
      As firmware have that control set for os aer already in query support stage,
      but later will not os aer handling.
      
      We should avoid passing all possible controls, just use osc_control_set
      instead.
      That should workaround BIOS bugs with affected systems on the field
      as more bios vendors are copying sample code from ACPI spec.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e99b12d
    • Tony Luck's avatar
      Fix initialization of CMCI/CMCP interrupts · 5bf457bc
      Tony Luck authored
      commit d303e9e9 upstream.
      
      Back 2010 during a revamp of the irq code some initializations
      were moved from ia64_mca_init() to ia64_mca_late_init() in
      
      	commit c75f2aa1
      	Cannot use register_percpu_irq() from ia64_mca_init()
      
      But this was hideously wrong. First of all these initializations
      are now down far too late. Specifically after all the other cpus
      have been brought up and initialized their own CMC vectors from
      smp_callin(). Also ia64_mca_late_init() may be called from any cpu
      so the line:
      	ia64_mca_cmc_vector_setup();       /* Setup vector on BSP */
      is generally not executed on the BSP, and so the CMC vector isn't
      setup at all on that processor.
      
      Make use of the arch_early_irq_init() hook to get this code executed
      at just the right moment: not too early, not too late.
      Reported-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Tested-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bf457bc
    • Ming Lei's avatar
      sysfs: fix use after free in case of concurrent read/write and readdir · 6615e6db
      Ming Lei authored
      commit f7db5e76 upstream.
      
      The inode->i_mutex isn't hold when updating filp->f_pos
      in read()/write(), so the filp->f_pos might be read as
      0 or 1 in readdir() when there is concurrent read()/write()
      on this same file, then may cause use after free in readdir().
      
      The bug can be reproduced with Li Zefan's test code on the
      link:
      
      	https://patchwork.kernel.org/patch/2160771/
      
      This patch fixes the use after free under this situation.
      Reported-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6615e6db
    • Steven A. Falco's avatar
      i2c: xiic: must always write 16-bit words to TX_FIFO · 0e079960
      Steven A. Falco authored
      commit c39e8e43 upstream.
      
      The TX_FIFO register is 10 bits wide.  The lower 8 bits are the data to be
      written, while the upper two bits are flags to indicate stop/start.
      
      The driver apparently attempted to optimize write access, by only writing a
      byte in those cases where the stop/start bits are zero.  However, we have
      seen cases where the lower byte is duplicated onto the upper byte by the
      hardware, which causes inadvertent stop/starts.
      
      This patch changes the write access to the transmit FIFO to always be 16 bits
      wide.
      
      Signed off by: Steven A. Falco <sfalco@harris.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e079960
    • Namhyung Kim's avatar
      tracing: Reset ftrace_graph_filter_enabled if count is zero · 761694a0
      Namhyung Kim authored
      commit 9f50afcc upstream.
      
      The ftrace_graph_count can be decreased with a "!" pattern, so that
      the enabled flag should be updated too.
      
      Link: http://lkml.kernel.org/r/1365663698-2413-1-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      761694a0
    • Namhyung Kim's avatar
      tracing: Check return value of tracing_init_dentry() · e1672f4e
      Namhyung Kim authored
      commit ed6f1c99 upstream.
      
      Check return value and bail out if it's NULL.
      
      Link: http://lkml.kernel.org/r/1365553093-10180-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1672f4e
    • Namhyung Kim's avatar
      tracing: Fix off-by-one on allocating stat->pages · 226c8ea6
      Namhyung Kim authored
      commit 39e30cd1 upstream.
      
      The first page was allocated separately, so no need to start from 0.
      
      Link: http://lkml.kernel.org/r/1364820385-32027-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      226c8ea6