1. 13 May, 2013 25 commits
    • Bjørn Mork's avatar
      USB: option: add a D-Link DWM-156 variant · bd2ff9f2
      Bjørn Mork authored
      commit a2a2d6c7 upstream.
      
      Adding support for a Mediatek based device labelled as
      D-Link Model: DWM-156, H/W Ver: A7
      
      Also adding two other device IDs found in the Debian(!)
      packages included on the embedded device driver CD.
      
      This is a composite MBIM + serial ports + card reader device:
      
      T:  Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 14 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=2001 ProdID=7d01 Rev= 3.00
      S:  Manufacturer=D-Link,Inc
      S:  Product=D-Link DWM-156
      C:* #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=500mA
      A:  FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00
      I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim
      E:  Ad=88(I) Atr=03(Int.) MxPS=  64 Ivl=125us
      I:  If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
      I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=02 Prot=01 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=500us
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bd2ff9f2
    • Namhyung Kim's avatar
      tracing: Fix off-by-one on allocating stat->pages · 223288e3
      Namhyung Kim authored
      commit 39e30cd1 upstream.
      
      The first page was allocated separately, so no need to start from 0.
      
      Link: http://lkml.kernel.org/r/1364820385-32027-2-git-send-email-namhyung@kernel.org
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      223288e3
    • J. Bruce Fields's avatar
      nfsd4: don't close read-write opens too soon · be80ab09
      J. Bruce Fields authored
      commit 0c7c3e67 upstream.
      
      Don't actually close any opens until we don't need them at all.
      
      This means being left with write access when it's not really necessary,
      but that's better than putting a file that might still have posix locks
      held on it, as we have been.
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      be80ab09
    • David Engraf's avatar
      hrtimer: Fix ktime_add_ns() overflow on 32bit architectures · 75ebfaad
      David Engraf authored
      commit 51fd36f3 upstream.
      
      One can trigger an overflow when using ktime_add_ns() on a 32bit
      architecture not supporting CONFIG_KTIME_SCALAR.
      
      When passing a very high value for u64 nsec, e.g. 7881299347898368000
      the do_div() function converts this value to seconds (7881299347) which
      is still to high to pass to the ktime_set() function as long. The result
      in is a negative value.
      
      The problem on my system occurs in the tick-sched.c,
      tick_nohz_stop_sched_tick() when time_delta is set to
      timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
      valid, thus ktime_add_ns() is called with a too large value resulting in
      a negative expire value. This leads to an endless loop in the ticker code:
      
      time_delta: 7881299347898368000
      expires = ktime_add_ns(last_update, time_delta)
      expires: negative value
      
      This fix caps the value to KTIME_MAX.
      
      This error doesn't occurs on 64bit or architectures supporting
      CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
      Signed-off-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      [jstultz: Minor tweaks to commit message & header]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      75ebfaad
    • Prarit Bhargava's avatar
      hrtimer: Add expiry time overflow check in hrtimer_interrupt · 79d8ce89
      Prarit Bhargava authored
      commit 8f294b5a upstream.
      
      The settimeofday01 test in the LTP testsuite effectively does
      
              gettimeofday(current time);
              settimeofday(Jan 1, 1970 + 100 seconds);
              settimeofday(current time);
      
      This test causes a stack trace to be displayed on the console during the
      setting of timeofday to Jan 1, 1970 + 100 seconds:
      
      [  131.066751] ------------[ cut here ]------------
      [  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
      [  131.104935] Hardware name: Dinar
      [  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
      [  131.182248] Call Trace:
      [  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
      [  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
      [  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
      [  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
      [  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
      [  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
      [  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
      [  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
      [  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
      [  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
      [  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
      [  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
      [  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
      [  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
      [  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
      [  131.274449] ---[ end trace 1151a50552231615 ]---
      
      When we change the system time to a low value like this, the value of
      timekeeper->offs_real will be a negative value.
      
      It seems that the WARN occurs because an hrtimer has been started in the time
      between the releasing of the timekeeper lock and the IPI call (via a call to
      on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
      is that a REALTIME_CLOCK timer has been added with softexpires = expires =
      KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
      kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
      clock base's offset (which was set to timekeeper->offs_real in
      do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
      was KTIME_MAX):
      
      	KTIME_MAX - (a negative value) = overflow
      
      A simple check for an overflow can resolve this problem.  Using KTIME_MAX
      instead of the overflow value will result in the hrtimer function being run,
      and the reprogramming of the timer after that.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweaked commit subject]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      79d8ce89
    • Stefani Seibold's avatar
      USB: add ftdi_sio USB ID for GDM Boost V1.x · 41d3bb1f
      Stefani Seibold authored
      commit 58f8b6c4 upstream.
      
      This patch add a missing usb device id for the GDMBoost V1.x device
      
      The patch is against 3.9-rc5
      Signed-off-by: default avatarStefani Seibold <stefani@seibold.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      41d3bb1f
    • Christian Lamparter's avatar
      drm/i915: Add no-lvds quirk for Fujitsu Esprimo Q900 · d05fd48a
      Christian Lamparter authored
      commit 9e9dd0e8 upstream.
      
      The "Mobile Sandy Bridge CPUs" in the Fujitsu Esprimo Q900
      mini desktop PCs are probably misleading the LVDS detection
      code in intel_lvds_supported. Nothing is connected to the
      LVDS ports in these systems.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d05fd48a
    • Dmitry Monakhov's avatar
      jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback · 3464fc47
      Dmitry Monakhov authored
      commit 794446c6 upstream.
      
      The following race is possible:
      
      [kjournald2]                              other_task
      jbd2_journal_commit_transaction()
        j_state = T_FINISHED;
        spin_unlock(&journal->j_list_lock);
                                               ->jbd2_journal_remove_checkpoint()
      					   ->jbd2_journal_free_transaction();
      					     ->kmem_cache_free(transaction)
        ->j_commit_callback(journal, transaction);
          -> USE_AFTER_FREE
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      In order to demonstrace this issue one should mount ext4 with mount -o
      discard option on SSD disk.  This makes callback longer and race
      window becomes wider.
      
      In order to fix this we should mark transaction as finished only after
      callbacks have completed
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [bwh: Backported to 3.2: s/jbd2_journal_free_transaction/kfree/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3464fc47
    • Theodore Ts'o's avatar
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · 164ed438
      Theodore Ts'o authored
      commit d76a3a77 upstream.
      
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: default avatarGeorge Barnett <gbarnett@atlassian.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      164ed438
    • fanchaoting's avatar
      nfsd: don't run get_file if nfs4_preprocess_stateid_op return error · 7e6c247f
      fanchaoting authored
      commit b022032e upstream.
      
      we should return error status directly when nfs4_preprocess_stateid_op
      return error.
      Signed-off-by: default avatarfanchaoting <fanchaoting@cn.fujitsu.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7e6c247f
    • Ben Jencks's avatar
      usb/misc/appledisplay: Add 24" LED Cinema display · 792c71ab
      Ben Jencks authored
      commit e7d3b6e2 upstream.
      
      Add the Apple 24" LED Cinema display to the supported devices.
      Signed-off-by: default avatarBen Jencks <ben@bjencks.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      792c71ab
    • Ming Lei's avatar
      sysfs: fix use after free in case of concurrent read/write and readdir · 3a8aad02
      Ming Lei authored
      commit f7db5e76 upstream.
      
      The inode->i_mutex isn't hold when updating filp->f_pos
      in read()/write(), so the filp->f_pos might be read as
      0 or 1 in readdir() when there is concurrent read()/write()
      on this same file, then may cause use after free in readdir().
      
      The bug can be reproduced with Li Zefan's test code on the
      link:
      
      	https://patchwork.kernel.org/patch/2160771/
      
      This patch fixes the use after free under this situation.
      Reported-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: file position is child inode number, not hash]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3a8aad02
    • Tony Luck's avatar
      Fix initialization of CMCI/CMCP interrupts · 893bee37
      Tony Luck authored
      commit d303e9e9 upstream.
      
      Back 2010 during a revamp of the irq code some initializations
      were moved from ia64_mca_init() to ia64_mca_late_init() in
      
      	commit c75f2aa1
      	Cannot use register_percpu_irq() from ia64_mca_init()
      
      But this was hideously wrong. First of all these initializations
      are now down far too late. Specifically after all the other cpus
      have been brought up and initialized their own CMC vectors from
      smp_callin(). Also ia64_mca_late_init() may be called from any cpu
      so the line:
      	ia64_mca_cmc_vector_setup();       /* Setup vector on BSP */
      is generally not executed on the BSP, and so the CMC vector isn't
      setup at all on that processor.
      
      Make use of the arch_early_irq_init() hook to get this code executed
      at just the right moment: not too early, not too late.
      Reported-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Tested-by: default avatarFred Hartnett <fred.hartnett@hp.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      893bee37
    • Alex Deucher's avatar
      drm/radeon: use frac fb div on RS780/RS880 · 9ba24024
      Alex Deucher authored
      commit 41167828 upstream.
      
      Monitors seem to prefer it.  Fixes:
      https://bugs.freedesktop.org/show_bug.cgi?id=37696Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - Add to pll->flags, not radeon_crtc->pll_flags]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9ba24024
    • Alex Deucher's avatar
      drm/radeon: don't use get_engine_clock() on APUs · c54fd985
      Alex Deucher authored
      commit bf05d998 upstream.
      
      It doesn't work reliably.  Just report back the currently
      selected engine clock.
      
      Partially fixes:
      https://bugs.freedesktop.org/show_bug.cgi?id=62493Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c54fd985
    • Yinghai Lu's avatar
      PCI / ACPI: Don't query OSC support with all possible controls · c08f24e2
      Yinghai Lu authored
      commit 545d6e18 upstream.
      
      Found problem on system that firmware that could handle pci aer.
      Firmware get error reporting after pci injecting error, before os boots.
      But after os boots, firmware can not get report anymore, even pci=noaer
      is passed.
      
      Root cause: BIOS _OSC has problem with query bit checking.
      It turns out that BIOS vendor is copying example code from ACPI Spec.
      In ACPI Spec 5.0, page 290:
      
      	If (Not(And(CDW1,1))) // Query flag clear?
      	{	// Disable GPEs for features granted native control.
      		If (And(CTRL,0x01)) // Hot plug control granted?
      		{
      			Store(0,HPCE) // clear the hot plug SCI enable bit
      			Store(1,HPCS) // clear the hot plug SCI status bit
      		}
      	...
      	}
      
      When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
      So it will get into code path that should be for control set only.
      BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
      
      Current kernel code is using _OSC query to notify firmware about support
      from OS and then use _OSC to set control bits.
      During query support, current code is using all possible controls.
      So will execute code that should be only for control set stage.
      
      That will have problem when pci=noaer or aer firmware_first is used.
      As firmware have that control set for os aer already in query support stage,
      but later will not os aer handling.
      
      We should avoid passing all possible controls, just use osc_control_set
      instead.
      That should workaround BIOS bugs with affected systems on the field
      as more bios vendors are copying sample code from ACPI spec.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c08f24e2
    • Li Zefan's avatar
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · f596ca6c
      Li Zefan authored
      commit 3ac1707a upstream.
      
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f596ca6c
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the kvm implementation · 6d895982
      Stephan Schreiber authored
      commit de53e9ca upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/kvm/vtlb.c.
      
      I observed this on Kernel 3.2.35 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/kvm/vtlb.c:
      
      u64 guest_vhpt_lookup(u64 iha, u64 *pte)
      {
      	u64 ret;
      	struct thash_data *data;
      
      	data = __vtr_lookup(current_vcpu, iha, D_TLB);
      	if (data != NULL)
      		thash_vhpt_insert(current_vcpu, data->page_flags,
      			data->itir, iha, D_TLB);
      
      	asm volatile (
      			"rsm psr.ic|psr.i;;"
      			"srlz.d;;"
      			"ld8.s r9=[%1];;"
      			"tnat.nz p6,p7=r9;;"
      			"(p6) mov %0=1;"
      			"(p6) mov r9=r0;"
      			"(p7) extr.u r9=r9,0,53;;"
      			"(p7) mov %0=r0;"
      			"(p7) st8 [%2]=r9;;"
      			"ssm psr.ic;;"
      			"srlz.d;;"
      			"ssm psr.i;;"
      			"srlz.d;;"
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      
      	return ret;
      }
      
      The list of output registers is
      			: "=r"(ret) : "r"(iha), "r"(pte):"memory");
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are iha, pte on the example.
      If the predicate p7 is true, the 8th assembly instruction
      			"(p7) mov %0=r0;"
      is the first one which writes to a register which is maintained by the
      register constraints; it sets %0. %0 means the first register operand;
      it is ret here.
      This instruction might overwrite the %2 register (pte) which is needed
      by the next instruction:
      			"(p7) st8 [%2]=r9;;"
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The attached patch  fixes the register operand constraints in
      arch/ia64/kvm/vtlb.c.
      The register constraints should be
      			: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      This is Debian bug#702639
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6d895982
    • Stephan Schreiber's avatar
      Wrong asm register contraints in the futex implementation · b5789c66
      Stephan Schreiber authored
      commit 136f39dd upstream.
      
      The Linux Kernel contains some inline assembly source code which has
      wrong asm register constraints in arch/ia64/include/asm/futex.h.
      
      I observed this on Kernel 3.2.23 but it is also true on the most
      recent Kernel 3.9-rc1.
      
      File arch/ia64/include/asm/futex.h:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8");
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov %0=r0				\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "=r" (r8), "=r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      The list of output registers is
      			: "=r" (r8), "=r" (prev)
      The constraint "=r" means that the GCC has to maintain that these vars
      are in registers and contain valid info when the program flow leaves
      the assembly block (output registers).
      But "=r" also means that GCC can put them in registers that are used
      as input registers. Input registers are uaddr, newval, oldval on the
      example.
      The second assembly instruction
      			"	mov %0=r0				\n"
      is the first one which writes to a register; it sets %0 to 0. %0 means
      the first register operand; it is r8 here. (The r0 is read-only and
      always 0 on the Itanium; it can be used if an immediate zero value is
      needed.)
      This instruction might overwrite one of the other registers which are
      still needed.
      Whether it really happens depends on how GCC decides what registers it
      uses and how it optimizes the code.
      
      The objdump utility can give us disassembly.
      The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
      look for a module that uses the funtion. This is the
      cmpxchg_futex_value_locked() function in
      kernel/futex.c:
      
      static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
      				      u32 uval, u32 newval)
      {
      	int ret;
      
      	pagefault_disable();
      	ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
      	pagefault_enable();
      
      	return ret;
      }
      
      Now the disassembly. At first from the Kernel package 3.2.23 which has
      been compiled with GCC 4.4, remeber this Kernel seemed to work:
      objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	00 00 00 02 00 00 	            nop.m 0x0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 88 00 08 e0 	[MLX]       addp4 r8=r34,r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 84 40 90 11 	[MIB]       st4 [r32]=r33
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 58 20 1a 19 21 	[MMI]       adds r11=3208,r13;;
            2f6:	20 01 2c 20 20 00 	            ld4 r18=[r11]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 88 fc 25 3f 23 	[MMI]       adds r17=-1,r18;;
            306:	00 88 2c 20 23 00 	            st4 [r11]=r17
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      The lines
            2b0:	0a 00 00 00 22 00 	[MMI]       mf;;
            2b6:	80 00 00 00 42 00 	            mov r8=r0
            2bc:	00 00 04 00       	            nop.i 0x0
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
            2c6:	10 1a 85 22 20 00 	            cmpxchg4.acq r33=[r33],r35,ar.ccv
            2cc:	00 00 04 00       	            nop.i 0x0;;
      are the instructions of the assembly block.
      The line
            2b6:	80 00 00 00 42 00 	            mov r8=r0
      sets the r8 register to 0 and after that
            2c0:	0b 00 20 40 2a 04 	[MMI]       mov.m ar.ccv=r8;;
      prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
      is wrong.
      What happened here is what I explained above: An input register is
      overwritten which is still needed.
      The register operand constraints in futex.h are wrong.
      
      (The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
      
      The attached patch fixes the register operand constraints in futex.h.
      The code after patching of it:
      
      static inline int
      futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      			      u32 oldval, u32 newval)
      {
      	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
      		return -EFAULT;
      
      	{
      		register unsigned long r8 __asm ("r8") = 0;
      		unsigned long prev;
      		__asm__ __volatile__(
      			"	mf;;					\n"
      			"	mov ar.ccv=%4;;				\n"
      			"[1:]	cmpxchg4.acq %1=[%2],%3,ar.ccv		\n"
      			"	.xdata4 \"__ex_table\", 1b-., 2f-.	\n"
      			"[2:]"
      			: "+r" (r8), "=&r" (prev)
      			: "r" (uaddr), "r" (newval),
      			  "rO" ((long) (unsigned) oldval)
      			: "memory");
      		*uval = prev;
      		return r8;
      	}
      }
      
      I also initialized the 'r8' var with the C programming language.
      The _asm qualifier on the definition of the 'r8' var forces GCC to use
      the r8 processor register for it.
      I don't believe that we should use inline assembly for zeroing out a
      local variable.
      The constraint is
      "+r" (r8)
      what means that it is both an input register and an output register.
      Note that the page fault handler will modify the r8 register which
      will be the return value of the function.
      The real fix is
      "=&r" (prev)
      The & means that GCC must not use any of the input registers to place
      this output register in.
      
      Patched the Kernel 3.2.23 and compiled it with GCC4.4:
      
      0000000000000230 <cmpxchg_futex_value_locked>:
            230:	0b 18 80 1b 18 21 	[MMI]       adds r3=3168,r13;;
            236:	80 40 0d 00 42 00 	            adds r8=40,r3
            23c:	00 00 04 00       	            nop.i 0x0;;
            240:	0b 50 00 10 10 10 	[MMI]       ld4 r10=[r8];;
            246:	90 08 28 00 42 00 	            adds r9=1,r10
            24c:	00 00 04 00       	            nop.i 0x0;;
            250:	09 00 00 00 01 00 	[MMI]       nop.m 0x0
            256:	00 48 20 20 23 00 	            st4 [r8]=r9
            25c:	00 00 04 00       	            nop.i 0x0;;
            260:	08 10 80 06 00 21 	[MMI]       adds r2=32,r3
            266:	20 12 01 10 40 00 	            addp4 r34=r34,r0
            26c:	02 08 f1 52       	            extr.u r16=r33,0,61
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
            276:	ff ff 0f 00 00 e0 	            movl r15=0xfffffffbfff;;
            27c:	f1 f7 ff 65
            280:	09 70 00 04 18 10 	[MMI]       ld8 r14=[r2]
            286:	00 00 00 02 00 c0 	            nop.m 0x0
            28c:	f0 80 1c d0       	            cmp.ltu p6,p7=r15,r16;;
            290:	08 40 fc 1d 09 3b 	[MMI]       cmp.eq p8,p9=-1,r14
            296:	00 00 00 02 00 40 	            nop.m 0x0
            29c:	e1 08 2d d0       	            cmp.ltu p10,p11=r14,r33
            2a0:	56 01 10 00 40 10 	[BBB] (p10) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2a6:	02 08 00 80 21 03 	      (p08) br.cond.dpnt.few 2b0
      <cmpxchg_futex_value_locked+0x80>
            2ac:	40 00 00 41       	      (p06) br.cond.spnt.few 2e0
      <cmpxchg_futex_value_locked+0xb0>
            2b0:	0b 00 00 00 22 00 	[MMI]       mf;;
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
            2bc:	00 00 04 00       	            nop.i 0x0;;
            2c0:	09 58 8c 42 11 10 	[MMI]       cmpxchg4.acq r11=[r33],r35,ar.ccv
            2c6:	00 00 00 02 00 00 	            nop.m 0x0
            2cc:	00 00 04 00       	            nop.i 0x0;;
            2d0:	10 00 2c 40 90 11 	[MIB]       st4 [r32]=r11
            2d6:	00 00 00 02 00 00 	            nop.i 0x0
            2dc:	20 00 00 40       	            br.few 2f0
      <cmpxchg_futex_value_locked+0xc0>
            2e0:	09 40 c8 f9 ff 27 	[MMI]       mov r8=-14
            2e6:	00 00 00 02 00 00 	            nop.m 0x0
            2ec:	00 00 04 00       	            nop.i 0x0;;
            2f0:	0b 88 20 1a 19 21 	[MMI]       adds r17=3208,r13;;
            2f6:	30 01 44 20 20 00 	            ld4 r19=[r17]
            2fc:	00 00 04 00       	            nop.i 0x0;;
            300:	0b 90 fc 27 3f 23 	[MMI]       adds r18=-1,r19;;
            306:	00 90 44 20 23 00 	            st4 [r17]=r18
            30c:	00 00 04 00       	            nop.i 0x0;;
            310:	11 00 00 00 01 00 	[MIB]       nop.m 0x0
            316:	00 00 00 02 00 80 	            nop.i 0x0
            31c:	08 00 84 00       	            br.ret.sptk.many b0;;
      
      Much better.
      There is a
            270:	05 40 00 00 00 e1 	[MLX]       mov r8=r0
      which was generated by C code r8 = 0. Below
            2b6:	00 10 81 54 08 00 	            mov.m ar.ccv=r34
      what means that oldval is no longer overwritten.
      
      This is Debian bug#702641
      (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
      
      The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
      Signed-off-by: default avatarStephan Schreiber <info@fs-driver.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b5789c66
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix ftrace_dump() · 7462e0e2
      Steven Rostedt (Red Hat) authored
      commit 7fe70b57 upstream.
      
      ftrace_dump() had a lot of issues. What ftrace_dump() does, is when
      ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it
      will dump out the ftrace buffers to the console when either a oops,
      panic, or a sysrq-z occurs.
      
      This was written a long time ago when ftrace was fragile to recursion.
      But it wasn't written well even for that.
      
      There's a possible deadlock that can occur if a ftrace_dump() is happening
      and an NMI triggers another dump. This is because it grabs a lock
      before checking if the dump ran.
      
      It also totally disables ftrace, and tracing for no good reasons.
      
      As the ring_buffer now checks if it is read via a oops or NMI, where
      there's a chance that the buffer gets corrupted, it will disable
      itself. No need to have ftrace_dump() do the same.
      
      ftrace_dump() is now cleaned up where it uses an atomic counter to
      make sure only one dump happens at a time. A simple atomic_inc_return()
      is enough that is needed for both other CPUs and NMIs. No need for
      a spinlock, as if one CPU is running the dump, no other CPU needs
      to do it too.
      
      The tracing_on variable is turned off and not turned on. The original
      code did this, but it wasn't pretty. By just disabling this variable
      we get the result of not seeing traces that happen between crashes.
      
      For sysrq-z, it doesn't get turned on, but the user can always write
      a '1' to the tracing_on file. If they are using sysrq-z, then they should
      know about tracing_on.
      
      The new code is much easier to read and less error prone. No more
      deadlock possibility when an NMI triggers here.
      Reported-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7462e0e2
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Remove most or all of stack tracer stack size from stack_max_size · cde591b5
      Steven Rostedt (Red Hat) authored
      commit 4df29712 upstream.
      
      Currently, the depth reported in the stack tracer stack_trace file
      does not match the stack_max_size file. This is because the stack_max_size
      includes the overhead of stack tracer itself while the depth does not.
      
      The first time a max is triggered, a calculation is not performed that
      figures out the overhead of the stack tracer and subtracts it from
      the stack_max_size variable. The overhead is stored and is subtracted
      from the reported stack size for comparing for a new max.
      
      Now the stack_max_size corresponds to the reported depth:
      
       # cat stack_max_size
      4640
      
       # cat stack_trace
              Depth    Size   Location    (48 entries)
              -----    ----   --------
        0)     4640      32   _raw_spin_lock+0x18/0x24
        1)     4608     112   ____cache_alloc+0xb7/0x22d
        2)     4496      80   kmem_cache_alloc+0x63/0x12f
        3)     4416      16   mempool_alloc_slab+0x15/0x17
      [...]
      
      While testing against and older gcc on x86 that uses mcount instead
      of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
      stack trace show one more function deep which was missing before.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cde591b5
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix stack tracer with fentry use · 1d2922fe
      Steven Rostedt (Red Hat) authored
      commit d4ecbfc4 upstream.
      
      When gcc 4.6 on x86 is used, the function tracer will use the new
      option -mfentry which does a call to "fentry" at every function
      instead of "mcount". The significance of this is that fentry is
      called as the first operation of the function instead of the mcount
      usage of being called after the stack.
      
      This causes the stack tracer to show some bogus results for the size
      of the last function traced, as well as showing "ftrace_call" instead
      of the function. This is due to the stack frame not being set up
      by the function that is about to be traced.
      
       # cat stack_trace
              Depth    Size   Location    (48 entries)
              -----    ----   --------
        0)     4824     216   ftrace_call+0x5/0x2f
        1)     4608     112   ____cache_alloc+0xb7/0x22d
        2)     4496      80   kmem_cache_alloc+0x63/0x12f
      
      The 216 size for ftrace_call includes both the ftrace_call stack
      (which includes the saving of registers it does), as well as the
      stack size of the parent.
      
      To fix this, if CC_USING_FENTRY is defined, then the stack_tracer
      will reserve the first item in stack_dump_trace[] array when
      calling save_stack_trace(), and it will fill it in with the parent ip.
      Then the code will look for the parent pointer on the stack and
      give the real size of the parent's stack pointer:
      
       # cat stack_trace
              Depth    Size   Location    (14 entries)
              -----    ----   --------
        0)     2640      48   update_group_power+0x26/0x187
        1)     2592     224   update_sd_lb_stats+0x2a5/0x4ac
        2)     2368     160   find_busiest_group+0x31/0x1f1
        3)     2208     256   load_balance+0xd9/0x662
      
      I'm Cc'ing stable, although it's not urgent, as it only shows bogus
      size for item #0, the rest of the trace is legit. It should still be
      corrected in previous stable releases.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1d2922fe
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Use stack of calling function for stack tracer · f2547fe3
      Steven Rostedt (Red Hat) authored
      commit 87889501 upstream.
      
      Use the stack of stack_trace_call() instead of check_stack() as
      the test pointer for max stack size. It makes it a bit cleaner
      and a little more accurate.
      
      Adding stable, as a later fix depends on this patch.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f2547fe3
    • Zhao Hongjiang's avatar
      aio: fix possible invalid memory access when DEBUG is enabled · bf0f91c0
      Zhao Hongjiang authored
      commit 91d80a84 upstream.
      
      dprintk() shouldn't access @ring after it's unmapped.
      Signed-off-by: default avatarZhao Hongjiang <zhaohongjiang@huawei.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: keep the second argument to kunmap_atomic()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bf0f91c0
    • Mathias Krause's avatar
      crypto: algif - suppress sending source address information in recvmsg · 419f4ba0
      Mathias Krause authored
      commit 72a763d8 upstream.
      
      The current code does not set the msg_namelen member to 0 and therefore
      makes net/socket.c leak the local sockaddr_storage variable to userland
      -- 128 bytes of kernel stack memory. Fix that.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      419f4ba0
  2. 25 Apr, 2013 15 commits