1. 08 May, 2013 40 commits
    • Shaohua Li's avatar
      MD: ignore discard request for hard disks of hybid raid1/raid10 array · 4e8ff554
      Shaohua Li authored
      commit 32f9f570 upstream.
      
      In SSD/hard disk hybid storage, discard request should be ignored for hard
      disk. We used to be doing this way, but the unplug path forgets it.
      
      This is suitable for stable tree since v3.6.
      Reported-and-tested-by: default avatarMarkus <M4rkusXXL@web.de>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e8ff554
    • NeilBrown's avatar
      md: bad block list should default to disabled. · aaf49388
      NeilBrown authored
      commit 486adf72 upstream.
      
      Maintenance of a bad-block-list currently defaults to 'enabled'
      and is then disabled when it cannot be supported.
      This is backwards and causes problem for dm-raid which didn't know
      to disable it.
      
      So fix the defaults, and only enabled for v1.x metadata which
      explicitly has bad blocks enabled.
      
      The problem with dm-raid has been present since badblock support was
      added in v3.1, so this patch is suitable for any -stable from 3.1
      onwards.
      Reported-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaf49388
    • Trond Myklebust's avatar
      LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot · 17f978dd
      Trond Myklebust authored
      commit 1dfd89af upstream.
      
      After a server reboot, the reclaimer thread will recover all the existing
      locks. For locks that are blocked, however, it will change the value
      of block->b_status to nlm_lck_denied_grace_period in order to signal that
      they need to wake up and resend the original blocking lock request.
      
      Due to a bug, however, the block->b_status never gets reset after the
      blocked locks have been woken up, and so the process goes into an
      infinite loop of resends until the blocked lock is satisfied.
      Reported-by: default avatarMarc Eshel <eshel@us.ibm.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17f978dd
    • Oleg Nesterov's avatar
      exec: do not abuse ->cred_guard_mutex in threadgroup_lock() · 0565d711
      Oleg Nesterov authored
      commit e56fb287 upstream.
      
      threadgroup_lock() takes signal->cred_guard_mutex to ensure that
      thread_group_leader() is stable.  This doesn't look nice, the scope of
      this lock in do_execve() is huge.
      
      And as Dave pointed out this can lead to deadlock, we have the
      following dependencies:
      
      	do_execve:		cred_guard_mutex -> i_mutex
      	cgroup_mount:		i_mutex -> cgroup_mutex
      	attach_task_by_pid:	cgroup_mutex -> cred_guard_mutex
      
      Change de_thread() to take threadgroup_change_begin() around the
      switch-the-leader code and change threadgroup_lock() to avoid
      ->cred_guard_mutex.
      
      Note that de_thread() can't sleep with ->group_rwsem held, this can
      obviously deadlock with the exiting leader if the writer is active, so it
      does threadgroup_change_end() before schedule().
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0565d711
    • Greg Thelen's avatar
      fs/dcache.c: add cond_resched() to shrink_dcache_parent() · 9c3d6c10
      Greg Thelen authored
      commit 421348f1 upstream.
      
      Call cond_resched() in shrink_dcache_parent() to maintain interactivity.
      
      Before this patch:
      
      	void shrink_dcache_parent(struct dentry * parent)
      	{
      		while ((found = select_parent(parent, &dispose)) != 0)
      			shrink_dentry_list(&dispose);
      	}
      
      select_parent() populates the dispose list with dentries which
      shrink_dentry_list() then deletes.  select_parent() carefully uses
      need_resched() to avoid doing too much work at once.  But neither
      shrink_dcache_parent() nor its called functions call cond_resched().  So
      once need_resched() is set select_parent() will return single dentry
      dispose list which is then deleted by shrink_dentry_list().  This is
      inefficient when there are a lot of dentry to process.  This can cause
      softlockup and hurts interactivity on non preemptable kernels.
      
      This change adds cond_resched() in shrink_dcache_parent().  The benefit
      of this is that need_resched() is quickly cleared so that future calls
      to select_parent() are able to efficiently return a big batch of dentry.
      
      These additional cond_resched() do not seem to impact performance, at
      least for the workload below.
      
      Here is a program which can cause soft lockup if other system activity
      sets need_resched().
      
      	int main()
      	{
      	        struct rlimit rlim;
      	        int i;
      	        int f[100000];
      	        char buf[20];
      	        struct timeval t1, t2;
      	        double diff;
      
      	        /* cleanup past run */
      	        system("rm -rf x");
      
      	        /* boost nfile rlimit */
      	        rlim.rlim_cur = 200000;
      	        rlim.rlim_max = 200000;
      	        if (setrlimit(RLIMIT_NOFILE, &rlim))
      	                err(1, "setrlimit");
      
      	        /* make directory for files */
      	        if (mkdir("x", 0700))
      	                err(1, "mkdir");
      
      	        if (gettimeofday(&t1, NULL))
      	                err(1, "gettimeofday");
      
      	        /* populate directory with open files */
      	        for (i = 0; i < 100000; i++) {
      	                snprintf(buf, sizeof(buf), "x/%d", i);
      	                f[i] = open(buf, O_CREAT);
      	                if (f[i] == -1)
      	                        err(1, "open");
      	        }
      
      	        /* close some of the files */
      	        for (i = 0; i < 85000; i++)
      	                close(f[i]);
      
      	        /* unlink all files, even open ones */
      	        system("rm -rf x");
      
      	        if (gettimeofday(&t2, NULL))
      	                err(1, "gettimeofday");
      
      	        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
      	                ((double)t1.tv_sec * 1000000 + t1.tv_usec));
      
      	        printf("done: %g elapsed\n", diff/1e6);
      	        return 0;
      	}
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c3d6c10
    • Zhao Hongjiang's avatar
      inotify: invalid mask should return a error number but not set it · 550fbb43
      Zhao Hongjiang authored
      commit 04df32fa upstream.
      
      When we run the crackerjack testsuite, the inotify_add_watch test is
      stalled.
      
      This is caused by the invalid mask 0 - the task is waiting for the event
      but it never comes.  inotify_add_watch() should return -EINVAL as it did
      before commit 676a0675 ("inotify: remove broken mask checks causing
      unmount to be EINVAL").  That commit removes the invalid mask check, but
      that check is needed.
      
      Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call.
      If none are set, just return -EINVAL.
      
      Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger
      the problem that above commit fixed.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarZhao Hongjiang <zhaohongjiang@huawei.com>
      Acked-by: default avatarJim Somerville <Jim.Somerville@windriver.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      550fbb43
    • Robert Richter's avatar
      sata_highbank: Rename proc_name to the module name · 4ad8e5d3
      Robert Richter authored
      commit 2cc1144a upstream.
      
      mkinitrd looks at /sys/class/scsi_host/host$hostnum/proc_name to find
      the module name of a disk driver. Current name is "highbank-ahci" but
      the module is "sata_highbank". Rename it to match the module name.
      Signed-off-by: default avatarRobert Richter <robert.richter@calxeda.com>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Alexander Graf <agraf@suse.de>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ad8e5d3
    • Thomas Gleixner's avatar
      clockevents: Set dummy handler on CPU_DEAD shutdown · ee50f837
      Thomas Gleixner authored
      commit 6f7a05d7 upstream.
      
      Vitaliy reported that a per cpu HPET timer interrupt crashes the
      system during hibernation. What happens is that the per cpu HPET timer
      gets shut down when the nonboot cpus are stopped. When the nonboot
      cpus are onlined again the HPET code sets up the MSI interrupt which
      fires before the clock event device is registered. The event handler
      is still set to hrtimer_interrupt, which then crashes the machine due
      to highres mode not being active.
      
      See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333
      
      There is no real good way to avoid that in the HPET code. The HPET
      code alrady has a mechanism to detect spurious interrupts when event
      handler == NULL for a similar reason.
      
      We can handle that in the clockevent/tick layer and replace the
      previous functional handler with a dummy handler like we do in
      tick_setup_new_device().
      
      The original clockevents code did this in clockevents_exchange_device(),
      but that got removed by commit 7c1e7689 (clockevents: prevent
      clockevent event_handler ending up handler_noop) which forgot to fix
      it up in tick_shutdown(). Same issue with the broadcast device.
      Reported-by: default avatarVitaliy Fillipov <vitalif@yourcmc.ru>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: 700333@bugs.debian.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee50f837
    • Steven Rostedt's avatar
      localmodconfig: Process source kconfig files as they are found · 3e3745f5
      Steven Rostedt authored
      commit ced9cb1a upstream.
      
      A bug was reported that caused localmodconfig to not keep all the
      dependencies of ATH9K. This was caused by the kconfig file:
      
      In drivers/net/wireless/ath/Kconfig:
      3e3745f5
    • Li Zefan's avatar
      cgroup: fix broken file xattrs · ae373596
      Li Zefan authored
      commit 712317ad upstream.
      
      We should store file xattrs in struct cfent instead of struct cftype,
      because cftype is a type while cfent is object instance of cftype.
      
      For example each cgroup has a tasks file, and each tasks file is
      associated with a uniq cfent, but all those files share the same
      struct cftype.
      
      Alexey Kodanev reported a crash, which can be reproduced:
      
        # mount -t cgroup -o xattr /sys/fs/cgroup
        # mkdir /sys/fs/cgroup/test
        # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks
        # rmdir /sys/fs/cgroup/test
        # umount /sys/fs/cgroup
        oops!
      
      In this case, simple_xattrs_free() will free the same struct simple_xattrs
      twice.
      
      tj: Dropped unused local variable @cft from cgroup_diput().
      Reported-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae373596
    • Li Zefan's avatar
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · d52008e4
      Li Zefan authored
      commit 3ac1707a upstream.
      
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d52008e4
    • Zhang Rui's avatar
      ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points · 0252cb3c
      Zhang Rui authored
      commit 94a40931 upstream.
      
      Commit 4ae46bef "Thermal: Introduce thermal_zone_trip_update()"
      introduced a regression causing the fan to be always on even when
      the system is idle.
      
      My original idea in that commit is that:
       - when the current temperature is above the trip point,
         keep the fan on, even if the temperature is dropping.
       - when the current temperature is below the trip point,
         turn on the fan when the temperature is raising,
         turn off the fan when the temperature is dropping.
      
      But this is what the code actually does:
       - when the current temperature is above the trip point,
         the fan keeps on.
       - when the current temperature is below the trip point,
         the fan is always on because thermal_get_trend()
         in driver/acpi/thermal.c returns THERMAL_TREND_RAISING.
      Thus the fan keeps running even if the system is idle.
      
      Fix this in drivers/acpi/thermal.c.
      
      [rjw: Changelog]
      References: https://bugzilla.kernel.org/show_bug.cgi?id=56591
      References: https://bugzilla.kernel.org/show_bug.cgi?id=56601
      References: https://bugzilla.kernel.org/show_bug.cgi?id=50041#c45Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Tested-by: default avatarMatthias <morpheusxyz123@yahoo.de>
      Tested-by: default avatarVille Syrjälä <syrjala@sci.fi>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0252cb3c
    • Wang YanQing's avatar
      ACPI: Fix wrong parameter passed to memblock_reserve · 9c2455ef
      Wang YanQing authored
      commit a6432ded upstream.
      
      Commit 53aac44c (ACPI: Store valid ACPI tables passed via early initrd
      in reserved memblock areas) introduced acpi_initrd_override() that
      passes a wrong value as the second argument to memblock_reserve().
      
      Namely, the second argument of memblock_reserve() is the size of the
      region, not the address of the top of it, so make
      acpi_initrd_override() pass the size in there as appropriate.
      
      [rjw: Changelog]
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c2455ef
    • Aaron Lu's avatar
      libata: acpi: make ata_ap_acpi_handle not block · 98ab042f
      Aaron Lu authored
      commit d66af4df upstream.
      
      Since commit 30dcf76a, ata_ap_acpi_handle will always do a namespace
      walk, which requires acquiring an acpi namespace mutex. This made it
      impossible to be used when calling path has held a spinlock.
      
      For example, it can occur in the following code path for pata_acpi:
      ata_scsi_queuecmd (ap->lock is acquired)
        __ata_scsi_queuecmd
          ata_scsi_translate
            ata_qc_issue
              pacpi_qc_issue
                ata_acpi_stm
                  ata_ap_acpi_handle
                    acpi_get_child
                      acpi_walk_namespace
                        acpi_ut_acquire_mutex (acquire mutex while holding lock)
      This caused scheduling while atomic bug, as reported in bug #56781.
      
      Actually, ata_ap_acpi_handle doesn't have to walk the namespace every
      time it is called, it can simply return the bound acpi handle on the
      corresponding SCSI host. The reason previously it is not done this way
      is, ata_ap_acpi_handle is used in the binding function
      ata_acpi_bind_host by ata_acpi_gtm when the handle is not bound to the
      SCSI host yet. Since we already have the ATA port's handle in its
      binding function, we can simply use it instead of calling
      ata_ap_acpi_handle there. So introduce a new function __ata_acpi_gtm,
      where it will receive an acpi handle param in addition to the ATA port
      which is solely used for debug statement. With this change, we can make
      ata_ap_acpi_handle simply return the bound handle for SCSI host instead
      of walking the acpi namespace now.
      
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=56781
      Reported-and-tested-by: <kenzopl@o2.pl>
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98ab042f
    • Johan Hovold's avatar
      drivers/rtc/rtc-at91rm9200.c: fix missing iounmap · 5b6a8e8e
      Johan Hovold authored
      commit 3427de92 upstream.
      
      Add missing iounmap to probe error path and remove.
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b6a8e8e
    • Derek Basehore's avatar
      drivers/rtc/rtc-cmos.c: don't disable hpet emulation on suspend · ec551567
      Derek Basehore authored
      commit e005715e upstream.
      
      There's a bug where rtc alarms are ignored after the rtc cmos suspends
      but before the system finishes suspend.  Since hpet emulation is
      disabled and it still handles the interrupts, a wake event is never
      registered which is done from the rtc layer.
      
      This patch reverts commit d1b2efa8 ("rtc: disable hpet emulation on
      suspend") which disabled hpet emulation.  To fix the problem mentioned
      in that commit, hpet_rtc_timer_init() is called directly on resume.
      Signed-off-by: default avatarDerek Basehore <dbasehore@chromium.org>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec551567
    • Mel Gorman's avatar
      mm: swap: mark swap pages writeback before queueing for direct IO · 9940e550
      Mel Gorman authored
      commit 0cdc444a upstream.
      
      As pointed out by Andrew Morton, the swap-over-NFS writeback is not
      setting PageWriteback before it is queued for direct IO.  While swap
      pages do not participate in BDI or process dirty accounting and the IO
      is synchronous, the writeback bit is still required and not setting it
      in this case was an oversight.  swapoff depends on the page writeback to
      synchronoise all pending writes on a swap page before it is reused.
      Swapcache freeing and reuse depend on checking the PageWriteback under
      lock to ensure the page is safe to reuse.
      
      Direct IO handlers and the direct IO handler for NFS do not deal with
      PageWriteback as they are synchronous writes.  In the case of NFS, it
      schedules pages (or a page in the case of swap) for IO and then waits
      synchronously for IO to complete in nfs_direct_write().  It is
      recognised that this is a slowdown from normal swap handling which is
      asynchronous and uses a completion handler.  Shoving PageWriteback
      handling down into direct IO handlers looks like a bad fit to handle the
      swap case although it may have to be dealt with some day if swap is
      converted to use direct IO in general and bmap is finally done away
      with.  At that point it will be necessary to refit asynchronous direct
      IO with completion handlers onto the swap subsystem.
      
      As swapcache currently depends on PageWriteback to protect against
      races, this patch sets PageWriteback under the page lock before queueing
      it for direct IO.  It is cleared when the direct IO handler returns.  IO
      errors are treated similarly to the direct-to-bio case except PageError
      is not set as in the case of swap-over-NFS, it is likely to be a
      transient error.
      
      It was asked what prevents such a page being reclaimed in parallel.
      With this patch applied, such a page will now be skipped (most of the
      time) or blocked until the writeback completes.  Reclaim checks
      PageWriteback under the page lock before calling try_to_free_swap and
      the page lock should prevent the page being requeued for IO before it is
      freed.
      
      This and Jerome's related patch should considered for -stable as far
      back as 3.6 when swap-over-NFS was introduced.
      
      [akpm@linux-foundation.org: use pr_err_ratelimited()]
      [akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9940e550
    • Jerome Marchand's avatar
      swap: redirty page if page write fails on swap file · 4e5d8307
      Jerome Marchand authored
      commit 2d30d31e upstream.
      
      Since commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages"), swap_writepage()
      calls direct_IO on swap files.  However, in that case the page isn't
      redirtied if I/O fails, and is therefore handled afterwards as if it has
      been successfully written to the swap file, leading to memory corruption
      when the page is eventually swapped back in.
      
      This patch sets the page dirty when direct_IO() fails.  It fixes a
      memory corruption that happened while using swap-over-NFS.
      Signed-off-by: default avatarJerome Marchand <jmarchan@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e5d8307
    • Prarit Bhargava's avatar
      hrtimer: Add expiry time overflow check in hrtimer_interrupt · 66e79283
      Prarit Bhargava authored
      commit 8f294b5a upstream.
      
      The settimeofday01 test in the LTP testsuite effectively does
      
              gettimeofday(current time);
              settimeofday(Jan 1, 1970 + 100 seconds);
              settimeofday(current time);
      
      This test causes a stack trace to be displayed on the console during the
      setting of timeofday to Jan 1, 1970 + 100 seconds:
      
      [  131.066751] ------------[ cut here ]------------
      [  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
      [  131.104935] Hardware name: Dinar
      [  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
      [  131.182248] Call Trace:
      [  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
      [  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
      [  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
      [  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
      [  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
      [  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
      [  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
      [  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
      [  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
      [  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
      [  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
      [  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
      [  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
      [  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
      [  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
      [  131.274449] ---[ end trace 1151a50552231615 ]---
      
      When we change the system time to a low value like this, the value of
      timekeeper->offs_real will be a negative value.
      
      It seems that the WARN occurs because an hrtimer has been started in the time
      between the releasing of the timekeeper lock and the IPI call (via a call to
      on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
      is that a REALTIME_CLOCK timer has been added with softexpires = expires =
      KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
      kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
      clock base's offset (which was set to timekeeper->offs_real in
      do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
      was KTIME_MAX):
      
      	KTIME_MAX - (a negative value) = overflow
      
      A simple check for an overflow can resolve this problem.  Using KTIME_MAX
      instead of the overflow value will result in the hrtimer function being run,
      and the reprogramming of the timer after that.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweaked commit subject]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66e79283
    • David Engraf's avatar
      hrtimer: Fix ktime_add_ns() overflow on 32bit architectures · f931d5e4
      David Engraf authored
      commit 51fd36f3 upstream.
      
      One can trigger an overflow when using ktime_add_ns() on a 32bit
      architecture not supporting CONFIG_KTIME_SCALAR.
      
      When passing a very high value for u64 nsec, e.g. 7881299347898368000
      the do_div() function converts this value to seconds (7881299347) which
      is still to high to pass to the ktime_set() function as long. The result
      in is a negative value.
      
      The problem on my system occurs in the tick-sched.c,
      tick_nohz_stop_sched_tick() when time_delta is set to
      timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
      valid, thus ktime_add_ns() is called with a too large value resulting in
      a negative expire value. This leads to an endless loop in the ticker code:
      
      time_delta: 7881299347898368000
      expires = ktime_add_ns(last_update, time_delta)
      expires: negative value
      
      This fix caps the value to KTIME_MAX.
      
      This error doesn't occurs on 64bit or architectures supporting
      CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
      Signed-off-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      [jstultz: Minor tweaks to commit message & header]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f931d5e4
    • Dylan Reid's avatar
      ASoC: max98088: Fix logging of hardware revision. · bee59d68
      Dylan Reid authored
      commit 98682063 upstream.
      
      The hardware revision of the codec is based at 0x40.  Subtract that
      before convering to ASCII.  The same as it is done for 98095.
      Signed-off-by: default avatarDylan Reid <dgreid@chromium.org>
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bee59d68
    • Kailang Yang's avatar
      ALSA: hda - Add the support for ALC286 codec · febafacf
      Kailang Yang authored
      commit 7fc7d047 upstream.
      
      It's yet another ALC269-variant.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      febafacf
    • Takashi Iwai's avatar
      ALSA: hda - Fix aamix activation with loopback control on VIA codecs · 02cd3482
      Takashi Iwai authored
      commit 65033cc8 upstream.
      
      When we have a loopback mixer control, this should manage the state
      whether the output paths include the aamix or not.  But the current
      code blindly initializes the output paths with aamix = true, thus the
      aamix is enabled unless the loopback mixer control is changed.
      
      Also, update_aamix_paths() called by the loopback mixer control put
      callback invokes snd_hda_activate_path() with aamix = true even for
      disabling the mixing.  This leaves the aamix path even though the
      loopback control is turned off.
      
      This patch fixes these issues:
      - Introduced aamix_default() helper to indicate whether with_aamix is
        true or false as default
      - Fix the argument in update_aamix_paths() for disabling loopback
      Reported-by: default avatarLydia Wang <LydiaWang@viatech.com.cn>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02cd3482
    • Clemens Ladisch's avatar
      ALSA: USB: adjust for changed 3.8 USB API · ff865cae
      Clemens Ladisch authored
      commit c75c5ab5 upstream.
      
      The recent changes in the USB API ("implement new semantics for
      URB_ISO_ASAP") made the former meaning of the URB_ISO_ASAP flag the
      default, and changed this flag to mean that URBs can be delayed.
      This is not the behaviour wanted by any of the audio drivers because
      it leads to discontinuous playback with very small period sizes.
      Therefore, our URBs need to be submitted without this flag.
      Reported-by: default avatarJoe Rayhawk <jrayhawk@fairlystable.org>
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff865cae
    • Takashi Iwai's avatar
      ALSA: usb-audio: Fix autopm error during probing · 62d585f3
      Takashi Iwai authored
      commit 60af3d03 upstream.
      
      We've got strange errors in get_ctl_value() in mixer.c during
      probing, e.g. on Hercules RMX2 DJ Controller:
      
        ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x201, wIndex = 0xa00, type = 4
        ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x200, wIndex = 0xa00, type = 4
        ....
      
      It turned out that the culprit is autopm: snd_usb_autoresume() returns
      -ENODEV when called during card->probing = 1.
      
      Since the call itself during card->probing = 1 is valid, let's fix the
      return value of snd_usb_autoresume() as success.
      Reported-and-tested-by: default avatarDaniel Schürmann <daschuer@mixxx.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62d585f3
    • Clemens Ladisch's avatar
      ALSA: usb-audio: disable autopm for MIDI devices · 166662b8
      Clemens Ladisch authored
      commit cbc200bc upstream.
      
      Commit 88a8516a (ALSA: usbaudio: implement USB autosuspend)
      introduced autopm for all USB audio/MIDI devices.  However, many MIDI
      devices, such as synthesizers, do not merely transmit MIDI messages but
      use their MIDI inputs to control other functions.  With autopm, these
      devices would get powered down as soon as the last MIDI port device is
      closed on the host.
      
      Even some plain MIDI interfaces could get broken: they automatically
      send Active Sensing messages while powered up, but as soon as these
      messages cease, the receiving device would interpret this as an
      accidental disconnection.
      
      Commit f5f16541 (ALSA: usb-audio: Fix missing autopm for MIDI input)
      introduced another regression: some devices (e.g. the Roland GAIA SH-01)
      are self-powered but do a reset whenever the USB interface's power state
      changes.
      
      To work around all this, just disable autopm for all USB MIDI devices.
      
      Reported-by: Laurens Holst
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      166662b8
    • Calvin Owens's avatar
      ALSA: usb: Add quirk for 192KHz recording on E-Mu devices · 3279e17e
      Calvin Owens authored
      commit 1539d4f8 upstream.
      
      When recording at 176.2KHz or 192Khz, the device adds a 32-bit length
      header to the capture packets, which obviously needs to be ignored for
      recording to work properly.
      
      Userspace expected:  L0 L1 L2 R0 R1 R2
      ...but actually got: R2 L0 L1 L2 R0 R1
      
      Also, the last byte of the length header being interpreted as L0 of
      the first sample caused spikes every 0.5ms, resulting in a loud 16KHz
      tone (about the highest 'B' on a piano) being present throughout
      captures.
      
      Tested at all sample rates on an E-Mu 0404USB, and tested for
      regressions on a generic USB headset.
      Signed-off-by: default avatarCalvin Owens <jcalvinowens@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3279e17e
    • Daniel Mack's avatar
      ALSA: snd-usb: try harder to find USB_DT_CS_ENDPOINT · 2e481f00
      Daniel Mack authored
      commit ebfc594c upstream.
      
      The USB_DT_CS_ENDPOINT class-specific endpoint descriptor is usually
      stuffed directly after the standard USB endpoint descriptor, and this is
      where the driver currently expects it to be.
      
      There are, however, devices in the wild that have it the other way
      around in their descriptor sets, so the USB_DT_CS_ENDPOINT comes
      *before* the standard enpoint. Devices known to implement it that way
      are "Sennheiser BTD-500" and Plantronics USB headsets.
      
      When the driver can't find the USB_DT_CS_ENDPOINT, it won't be able to
      change sample rates, as the bitmask for the validity of this command is
      storen in bmAttributes of that descriptor.
      
      Fix this by searching the entire interface instead of just the extra
      bytes of the first endpoint, in case the latter fails.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-and-tested-by: default avatarTorstein Hegge <hegge@resisty.net>
      Reported-and-tested-by: default avatarYves G <alsa-user@vivigatt.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e481f00
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix dock firmware loading · 53fca514
      Takashi Iwai authored
      commit e08b34e8 upstream.
      
      The commit [b209c4df: ALSA: emu10k1: cache emu1010 firmware] broke the
      firmware loading of the dock, just (mistakenly) ignoring a different
      firmware for docks on some models.  This patch revives them again.
      
      Bugzilla: https://bugs.archlinux.org/task/34865Reported-and-tested-by: default avatarTobias Powalowski <tobias.powalowski@googlemail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53fca514
    • Duncan Laurie's avatar
      TPM: Retry SaveState command in suspend path · 9bcb7573
      Duncan Laurie authored
      commit 32d33b29 upstream.
      
      If the TPM has already been sent a SaveState command before the driver
      is loaded it may have problems sending that same command again later.
      
      This issue is seen with the Chromebook Pixel due to a firmware bug in
      the legacy mode boot path which is sending the SaveState command
      before booting the kernel.  More information is available at
      http://crbug.com/203524
      
      This change introduces a retry of the SaveState command in the suspend
      path in order to work around this issue.  A future firmware update
      should fix this but this is also a trivial workaround in the driver
      that has no effect on systems that do not show this problem.
      
      When this does happen the TPM responds with a non-fatal TPM_RETRY code
      that is defined in the specification:
      
        The TPM is too busy to respond to the command immediately, but the
        command could be resubmitted at a later time.  The TPM MAY return
        TPM_RETRY for any command at any time.
      
      It can take several seconds before the TPM will respond again.  I
      measured a typical time between 3 and 4 seconds and the timeout is set
      at a safe 5 seconds.
      
      It is also possible to reproduce this with commands via /dev/tpm0.
      The bug linked above has a python script attached which can be used to
      test for this problem.  I tested a variety of TPMs from Infineon,
      Nuvoton, Atmel, and STMicro but was only able to reproduce this with
      LPC and I2C TPMs from Infineon.
      
      The TPM specification only loosely defines this behavior:
      
        TPM Main Level 2 Part 3 v1.2 r116, section 3.3. TPM_SaveState:
        The TPM MAY declare all preserved values invalid in response to any
        command other than TPM_Init.
      
        TCG PC Client BIOS Spec 1.21 section 8.3.1.
        After issuing a TPM_SaveState command, the OS SHOULD NOT issue TPM
        commands before transitioning to S3 without issuing another
        TPM_SaveState command.
      
        TCG PC Client TIS 1.21, section 4. Power Management:
        The TPM_SaveState command allows a Static OS to indicate to the TPM
        that the platform may enter a low power state where the TPM will be
        required to enter into the D3 power state.  The use of the term "may"
        is significant in that there is no requirement for the platform to
        actually enter the low power state after sending the TPM_SaveState
        command.  The software may, in fact, send subsequent commands after
        sending the TPM_SaveState command.
      
      Change-Id: I52b41e826412688e5b6c8ddd3bb16409939704e9
      Signed-off-by: default avatarDuncan Laurie <dlaurie@chromium.org>
      Signed-off-by: default avatarKent Yoder <key@linux.vnet.ibm.com>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bcb7573
    • Hugh Dickins's avatar
      mm: allow arch code to control the user page table ceiling · 258497d1
      Hugh Dickins authored
      commit 6ee8630e upstream.
      
      On architectures where a pgd entry may be shared between user and kernel
      (e.g.  ARM+LPAE), freeing page tables needs a ceiling other than 0.
      This patch introduces a generic USER_PGTABLES_CEILING that arch code can
      override.  It is the responsibility of the arch code setting the ceiling
      to ensure the complete freeing of the page tables (usually in
      pgd_free()).
      
      [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      258497d1
    • Anurup m's avatar
      fs/fscache/stats.c: fix memory leak · d340c737
      Anurup m authored
      commit ec686c92 upstream.
      
      There is a kernel memory leak observed when the proc file
      /proc/fs/fscache/stats is read.
      
      The reason is that in fscache_stats_open, single_open is called and the
      respective release function is not called during release.  Hence fix
      with correct release function - single_release().
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101Signed-off-by: default avatarAnurup m <anurup.m@huawei.com>
      Cc: shyju pv <shyju.pv@huawei.com>
      Cc: Sanil kumar <sanil.kumar@huawei.com>
      Cc: Nataraj m <nataraj.m@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d340c737
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the kvm implementation · ed34a287
      Stephan Schreiber authored
      commit de53e9ca upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/kvm/vtlb.c.
      
      I observed this on Kernel 3.2.35 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/kvm/vtlb.c:
      
      u64 guest_vhpt_lookup(u64 iha, u64 *pte)
      {
      	u64 ret;
      	struct thash_data *data;
      
      	data = __vtr_lookup(current_vcpu, iha, D_TLB);
      	if (data != NULL)
      		thash_vhpt_insert(current_vcpu, data->page_flags,
      			data->itir, iha, D_TLB);
      
      	asm volatile (
      			"rsm psr.ic|psr.i;;"
      			"srlz.d;;"
      			"ld8.s r9=[%1];;"
      			"tnat.nz p6,p7=r9;;"
      			"(p6) mov %0=1;"
      			"(p6) mov r9=r0;"
      			"(p7) extr.u r9=r9,0,53;;"
      			"(p7) mov %0=r0;"
      			"(p7) st8 [%2]=r9;;"
      			"ssm psr.ic;;"
      			"srlz.d;;"
      			"ssm psr.i;;"
      			"srlz.d;;"
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      
      	return ret;
      }
      
      The list of output registers is
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are iha, pte on the example.
      If the predicate p7 is true, the 8th assembly instruction
      			"(p7) mov %0=r0;"
      is the first one which writes to a register which is maintained by the
      register constraints; it sets %0. %0 means the first register operand;
      it is ret here.
      This instruction might overwrite the %2 register (pte) which is needed
      by the next instruction:
      			"(p7) st8 [%2]=r9;;"
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The attached patch  fixes the register operand constraints in
      arch/ia64/kvm/vtlb.c.
      The register constraints should be
      			: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      This is Debian bug#702639
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed34a287
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the futex implementation · 7b5b8170
      Stephan Schreiber authored
      commit 136f39dd upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/include/asm/futex.h.
      
      I observed this on Kernel 3.2.23 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/include/asm/futex.h:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8");
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov %0=r0				\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "=r" (r8), "=r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      The list of output registers is
      			: "=r" (r8), "=r" (prev)
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are uaddr, newval, oldval on the
      example.
      The second assembly instruction
      			"	mov %0=r0				\n"
      is the first one which writes to a register; it sets %0 to 0. %0 means
      the first register operand; it is r8 here. (The r0 is read-only and
      always 0 on the Itanium; it can be used if an immediate zero value is
      needed.)
      This instruction might overwrite one of the other registers which are
      still needed.
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The objdump utility can give us disassembly.
      The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
      look for a module that uses the funtion. This is the
      cmpxchg_futex_value_locked() function in
      kernel/futex.c:
      
      static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
      				      u32 uval, u32 newval)
      {
      	int ret;
      
      	pagefault_disable();
      	ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
      	pagefault_enable();
      
      	return ret;
      }
      
      Now the disassembly. At first from the Kernel package 3.2.23 which has
      been compiled with GCC 4.4, remeber this Kernel seemed to work:
      objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	00 00 00 02 00 00 	            nop.m 0x0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 88 00 08 e0 	[MLX]       addp4 r8=r34,r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 84 40 90 11 	[MIB]       st4 [r32]=r33
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 58 20 1a 19 21 	[MMI]       adds r11=3208,r13;;
            2f6:	20 01 2c 20 20 00 	            ld4 r18=[r11]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 88 fc 25 3f 23 	[MMI]       adds r17=-1,r18;;
            306:	00 88 2c 20 23 00 	            st4 [r11]=r17
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      The lines
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
      are the instructions of the assembly block.
      The line
            2b6:	80 00 00 00 42 00 	            mov r8=r0
      sets the r8 register to 0 and after that
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
      is wrong.
      What happened here is what I explained above: An input register is
      overwritten which is still needed.
      The register operand constraints in futex.h are wrong.
      
      (The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
      
      The attached patch fixes the register operand constraints in futex.h.
      The code after patching of it:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8") = 0;
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "+r" (r8), "=&r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      I also initialized the 'r8' var with the C programming language.
      The _asm qualifier on the definition of the 'r8' var forces GCC to use
      the r8 processor register for it.
      I don't believe that we should use inline assembly for zeroing out a
      local variable.
      The constraint is
      "+r" (r8)
      what means that it is both an input register and an output register.
      Note that the page fault handler will modify the r8 register which
      will be the return value of the function.
      The real fix is
      "=&r" (prev)
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      Patched the Kernel 3.2.23 and compiled it with GCC4.4:
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	20 12 01 10 40 00 	            addp4 r34=r34,r0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0b 00 00 00 22 00 	[MMI]       mf;;
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
            2bc:	00 00 04 00       	            nop.i 0x0;;
            2c0:	09 58 8c 42 11 10 	[MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
            2c6:	00 00 00 02 00 00 	            nop.m 0x0
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 2c 40 90 11 	[MIB]       st4 [r32]=r11
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 88 20 1a 19 21 	[MMI]       adds r17=3208,r13;;
            2f6:	30 01 44 20 20 00 	            ld4 r19=[r17]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 90 fc 27 3f 23 	[MMI]       adds r18=-1,r19;;
            306:	00 90 44 20 23 00 	            st4 [r17]=r18
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      Much better.
      There is a
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
      which was generated by C code r8 = 0. Below
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
      what means that oldval is no longer overwritten.
      
      This is Debian bug#702641
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b5b8170
    • Alex A. Mihaylov's avatar
      rt2x00: Fix transmit power troubles on some Ralink RT30xx cards · 4a277ad6
      Alex A. Mihaylov authored
      commit 7e9dafd8 upstream.
      
      Some cards on Ralink RT30xx chipset not have correctly TX_MIXER_GAIN
      value in them EEPROM/EFUSE. In this case, we must use default value,
      but always used EEPROM/EFUSE value. As result we have tranmitt power
      range from -10dBm to +6dBm instead 0dBm to +16dBm.
      
      Correctly value in EEPROM/EFUSE is one or more for RT3070 and two or
      more for other RT30xx chips.
      
      Tested on Canyon CNP-WF518N1 usb Wi-Fi dongle and Jorjin WN8020 usb
      embedded Wi-Fi module.
      Signed-off-by: default avatarAlex A. Mihaylov <minimumlaw@rambler.ru>
      Acked-by: default avatarGertjan van Wingerde <gwingerde@gmail.com>
      Acked-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a277ad6
    • Rafael J. Wysocki's avatar
      PCI/PM: Fix fallback to PCI_D0 in pci_platform_power_transition() · a91a151f
      Rafael J. Wysocki authored
      commit 769ba721 upstream.
      
      Commit b51306c6 (PCI: Set device power state to PCI_D0 for device
      without native PM support) modified pci_platform_power_transition()
      by adding code causing dev->current_state for devices that don't
      support native PCI PM but are power-manageable by the platform to be
      changed to PCI_D0 regardless of the value returned by the preceding
      platform_pci_set_power_state().  In particular, that also is done
      if the platform_pci_set_power_state() has been successful, which
      causes the correct power state of the device set by
      pci_update_current_state() in that case to be overwritten by PCI_D0.
      
      Fix that mistake by making the fallback to PCI_D0 only happen if
      the platform_pci_set_power_state() has returned an error.
      
      [bhelgaas: folded in Yinghai's simplification, added URL & stable info]
      Reference: http://lkml.kernel.org/r/27806FC4E5928A408B78E88BBC67A2306F466BBA@ORSMSX101.amr.corp.intel.comReported-by: default avatarChris J. Benenati <chris.j.benenati@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a91a151f
    • Yinghai Lu's avatar
      PCI / ACPI: Don't query OSC support with all possible controls · aebbd5d3
      Yinghai Lu authored
      commit 545d6e18 upstream.
      
      Found problem on system that firmware that could handle pci aer.
      Firmware get error reporting after pci injecting error, before os boots.
      But after os boots, firmware can not get report anymore, even pci=noaer
      is passed.
      
      Root cause: BIOS _OSC has problem with query bit checking.
      It turns out that BIOS vendor is copying example code from ACPI Spec.
      In ACPI Spec 5.0, page 290:
      
      	If (Not(And(CDW1,1))) // Query flag clear?
      	{	// Disable GPEs for features granted native control.
      		If (And(CTRL,0x01)) // Hot plug control granted?
      		{
      			Store(0,HPCE) // clear the hot plug SCI enable bit
      			Store(1,HPCS) // clear the hot plug SCI status bit
      		}
      	...
      	}
      
      When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
      So it will get into code path that should be for control set only.
      BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
      
      Current kernel code is using _OSC query to notify firmware about support
      from OS and then use _OSC to set control bits.
      During query support, current code is using all possible controls.
      So will execute code that should be only for control set stage.
      
      That will have problem when pci=noaer or aer firmware_first is used.
      As firmware have that control set for os aer already in query support stage,
      but later will not os aer handling.
      
      We should avoid passing all possible controls, just use osc_control_set
      instead.
      That should workaround BIOS bugs with affected systems on the field
      as more bios vendors are copying sample code from ACPI spec.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aebbd5d3
    • Tony Luck's avatar
      Fix initialization of CMCI/CMCP interrupts · 8ca53c6c
      Tony Luck authored
      commit d303e9e9 upstream.
      
      Back 2010 during a revamp of the irq code some initializations
      were moved from ia64_mca_init() to ia64_mca_late_init() in
      
      	commit c75f2aa1
      	Cannot use register_percpu_irq() from ia64_mca_init()
      
      But this was hideously wrong. First of all these initializations
      are now down far too late. Specifically after all the other cpus
      have been brought up and initialized their own CMC vectors from
      smp_callin(). Also ia64_mca_late_init() may be called from any cpu
      so the line:
      	ia64_mca_cmc_vector_setup();       /* Setup vector on BSP */
      is generally not executed on the BSP, and so the CMC vector isn't
      setup at all on that processor.
      
      Make use of the arch_early_irq_init() hook to get this code executed
      at just the right moment: not too early, not too late.
      Reported-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Tested-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ca53c6c
    • Ming Lei's avatar
      sysfs: fix use after free in case of concurrent read/write and readdir · 38f05ab3
      Ming Lei authored
      commit f7db5e76 upstream.
      
      The inode->i_mutex isn't hold when updating filp->f_pos
      in read()/write(), so the filp->f_pos might be read as
      0 or 1 in readdir() when there is concurrent read()/write()
      on this same file, then may cause use after free in readdir().
      
      The bug can be reproduced with Li Zefan's test code on the
      link:
      
      	https://patchwork.kernel.org/patch/2160771/
      
      This patch fixes the use after free under this situation.
      Reported-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38f05ab3
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Fix a bug in hv_need_to_signal() · 8c7fc2f0
      K. Y. Srinivasan authored
      commit 288fa3e0 upstream.
      
      As part of updating the vmbus protocol, the function hv_need_to_signal()
      was introduced. This functions helps optimize signalling from guest to
      host. The newly added memory barrier is needed to ensure that we correctly
      decide when to signal the host.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reported-by: default avatarOlaf Hering <olh@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c7fc2f0