Commit 1aff59be authored by Yunlong Song's avatar Yunlong Song Committed by Arnaldo Carvalho de Melo

perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task

Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).

The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.

For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.

As a result, all the tasks are required and their threads must be
successfully created.

If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.

And this is a dead halt, since task B's thread_func cannot continue at
all.

To solve this problem, perf sched replay should exit once any task fails
to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 ------------------------------------------------------------    <- dead halt

After this patch:

 $ perf sched replay
 ...
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 $

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.
Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 08097abc
......@@ -451,10 +451,12 @@ static int self_open_counters(void)
fd = sys_perf_event_open(&attr, 0, -1, -1,
perf_event_open_cloexec_flag());
if (fd < 0)
if (fd < 0) {
pr_err("Error: sys_perf_event_open() syscall returned "
"with %d (%s)\n", fd,
strerror_r(errno, sbuf, sizeof(sbuf)));
exit(EXIT_FAILURE);
}
return fd;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment