1. 09 Jan, 2014 7 commits
    • Peter Hurley's avatar
      tty: Fix hang at ldsem_down_read() · ab69be3e
      Peter Hurley authored
      commit cf872776 upstream.
      
      When a controlling tty is being hung up and the hang up is
      waiting for a just-signalled tty reader or writer to exit, and a new tty
      reader/writer tries to acquire an ldisc reference concurrently with the
      ldisc reference release from the signalled reader/writer, the hangup
      can hang. The new reader/writer is sleeping in ldsem_down_read() and the
      hangup is sleeping in ldsem_down_write() [1].
      
      The new reader/writer fails to wakeup the waiting hangup because the
      wrong lock count value is checked (the old lock count rather than the new
      lock count) to see if the lock is unowned.
      
      Change helper function to return the new lock count if the cmpxchg was
      successful; document this behavior.
      
      [1] edited dmesg log from reporter
      
      SysRq : Show Blocked State
        task                        PC stack   pid father
      systemd         D ffff88040c4f0000     0     1      0 0x00000000
       ffff88040c49fbe0 0000000000000046 ffff88040c4a0000 ffff88040c49ffd8
       00000000001d3980 00000000001d3980 ffff88040c4a0000 ffff88040593d840
       ffff88040c49fb40 ffffffff810a4cc0 0000000000000006 0000000000000023
      Call Trace:
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff817a6649>] schedule+0x24/0x5e
       [<ffffffff817a588b>] schedule_timeout+0x15b/0x1ec
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff817aa691>] ? _raw_spin_unlock_irq+0x24/0x26
       [<ffffffff817aa10c>] down_read_failed+0xe3/0x1b9
       [<ffffffff817aa26d>] ldsem_down_read+0x8b/0xa5
       [<ffffffff8142b5ca>] ? tty_ldisc_ref_wait+0x1b/0x44
       [<ffffffff8142b5ca>] tty_ldisc_ref_wait+0x1b/0x44
       [<ffffffff81423f5b>] tty_write+0x7d/0x28a
       [<ffffffff814241f5>] redirected_tty_write+0x8d/0x98
       [<ffffffff81424168>] ? tty_write+0x28a/0x28a
       [<ffffffff8115d03f>] do_loop_readv_writev+0x56/0x79
       [<ffffffff8115e604>] do_readv_writev+0x1b0/0x1ff
       [<ffffffff8116ea0b>] ? do_vfs_ioctl+0x32a/0x489
       [<ffffffff81167d9d>] ? final_putname+0x1d/0x3a
       [<ffffffff8115e6c7>] vfs_writev+0x2e/0x49
       [<ffffffff8115e7d3>] SyS_writev+0x47/0xaa
       [<ffffffff817ab822>] system_call_fastpath+0x16/0x1b
      bash            D ffffffff81c104c0     0  5469   5302 0x00000082
       ffff8800cf817ac0 0000000000000046 ffff8804086b22a0 ffff8800cf817fd8
       00000000001d3980 00000000001d3980 ffff8804086b22a0 ffff8800cf817a48
       000000000000b9a0 ffff8800cf817a78 ffffffff81004675 ffff8800cf817a44
      Call Trace:
       [<ffffffff81004675>] ? dump_trace+0x165/0x29c
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff8100edda>] ? save_stack_trace+0x26/0x41
       [<ffffffff817a6649>] schedule+0x24/0x5e
       [<ffffffff817a588b>] schedule_timeout+0x15b/0x1ec
       [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
       [<ffffffff817a9f03>] ? down_write_failed+0xa3/0x1c9
       [<ffffffff817aa691>] ? _raw_spin_unlock_irq+0x24/0x26
       [<ffffffff817a9f0b>] down_write_failed+0xab/0x1c9
       [<ffffffff817aa300>] ldsem_down_write+0x79/0xb1
       [<ffffffff817aada3>] ? tty_ldisc_lock_pair_timeout+0xa5/0xd9
       [<ffffffff817aada3>] tty_ldisc_lock_pair_timeout+0xa5/0xd9
       [<ffffffff8142bf33>] tty_ldisc_hangup+0xc4/0x218
       [<ffffffff81423ab3>] __tty_hangup+0x2e2/0x3ed
       [<ffffffff81424a76>] disassociate_ctty+0x63/0x226
       [<ffffffff81078aa7>] do_exit+0x79f/0xa11
       [<ffffffff81086bdb>] ? get_signal_to_deliver+0x206/0x62f
       [<ffffffff810b4bfb>] ? lock_release_holdtime.part.8+0xf/0x16e
       [<ffffffff81079b05>] do_group_exit+0x47/0xb5
       [<ffffffff81086c16>] get_signal_to_deliver+0x241/0x62f
       [<ffffffff810020a7>] do_signal+0x43/0x59d
       [<ffffffff810f2af7>] ? __audit_syscall_exit+0x21a/0x2a8
       [<ffffffff810b4bfb>] ? lock_release_holdtime.part.8+0xf/0x16e
       [<ffffffff81002655>] do_notify_resume+0x54/0x6c
       [<ffffffff817abaf8>] int_signal+0x12/0x17
      Reported-by: default avatarSami Farin <sami.farin@gmail.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab69be3e
    • pingfan liu's avatar
      powerpc: kvm: fix rare but potential deadlock scene · d18b5a0e
      pingfan liu authored
      commit 91648ec0 upstream.
      
      Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
      realmode, so it can trigger the deadlock.
      
      Suppose the following scene:
      
      Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
      vcpus.
      
      If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
      out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
      caught in realmode for a long time.
      
      What makes things even worse if the following happens,
        On cpuM, bitlockX is hold, on cpuN, Y is hold.
        vcpu_B_2 try to lock Y on cpuM in realmode
        vcpu_A_2 try to lock X on cpuN in realmode
      
      Oops! deadlock happens
      Signed-off-by: default avatarLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d18b5a0e
    • Li Wang's avatar
      ceph: allocate non-zero page to fscache in readpage() · 17e38d92
      Li Wang authored
      commit ff638b7d upstream.
      
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      Signed-off-by: default avatarLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: default avatarMilosz Tanski <milosz@adfin.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17e38d92
    • Yan, Zheng's avatar
      ceph: wake up 'safe' waiters when unregistering request · b4195883
      Yan, Zheng authored
      commit fc55d2c9 upstream.
      
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4195883
    • Yan, Zheng's avatar
      ceph: cleanup aborted requests when re-sending requests. · 5bb82225
      Yan, Zheng authored
      commit eb1b8af3 upstream.
      
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bb82225
    • Milosz Tanski's avatar
      ceph: hung on ceph fscache invalidate in some cases · c6c0d18b
      Milosz Tanski authored
      commit ffc79664 upstream.
      
      In some cases I'm on my ceph client cluster I'm seeing hunk kernel tasks in
      the invalidate page code path. This is due to the fact that we don't check if
      the page is marked as cache before calling fscache_wait_on_page_write().
      
      This is the log from the hang
      
      INFO: task XXXXXX:12034 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       ...
      Call Trace:
      [<ffffffff81568d09>] schedule+0x29/0x70
      [<ffffffffa01d4cbd>] __fscache_wait_on_page_write+0x6d/0xb0 [fscache]
      [<ffffffff81083520>] ? add_wait_queue+0x60/0x60
      [<ffffffffa029a3e9>] ceph_invalidate_fscache_page+0x29/0x50 [ceph]
      [<ffffffffa027df00>] ceph_invalidatepage+0x70/0x190 [ceph]
      [<ffffffff8112656f>] ? delete_from_page_cache+0x5f/0x70
      [<ffffffff81133cab>] truncate_inode_page+0x8b/0x90
      [<ffffffff81133ded>] truncate_inode_pages_range.part.12+0x13d/0x620
      [<ffffffff8113431d>] truncate_inode_pages_range+0x4d/0x60
      [<ffffffff811343b5>] truncate_inode_pages+0x15/0x20
      [<ffffffff8119bbf6>] evict+0x1a6/0x1b0
      [<ffffffff8119c3f3>] iput+0x103/0x190
       ...
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6c0d18b
    • Johan Hovold's avatar
      USB: serial: fix race in generic write · f6b9ef96
      Johan Hovold authored
      commit 6f648546 upstream.
      
      Fix race in generic write implementation, which could lead to
      temporarily degraded throughput.
      
      The current generic write implementation introduced by commit
      27c7acf2 ("USB: serial: reimplement generic fifo-based writes") has
      always had this bug, although it's fairly hard to trigger and the
      consequences are not likely to be noticed.
      
      Specifically, a write() on one CPU while the completion handler is
      running on another could result in only one of the two write urbs being
      utilised to empty the remainder of the write fifo (unless there is a
      second write() that doesn't race during that time).
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6b9ef96
  2. 20 Dec, 2013 33 commits