From: Denys Vlasenko <dvlasenk@redhat.com> Date: Tue, 21 Oct 2008 13:26:14 +0200 Subject: [ia64] fix ptrace hangs when following threads Message-id: 1224588374.608.28.camel@localhost.localdomain O-Subject: [RHEL5.3 PATCH] bz461456 ia64: The trace of some threads unexpectedly stops when being traced by 'strace -f' Bugzilla: 461456 RH-Acked-by: Jerome Marchand <jmarchan@redhat.com> RH-Acked-by: Prarit Bhargava <prarit@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=461456 Description: When process is cloned with CLONE_PTRACE, it stops with SIGSTOP. This involves some work, and this work is not fully finished by the moment notification is delivered to the tracer. If tracer issues ptrace(PTRACE_SYSCALL) in order to continue the thread, kernel will first wait till tracee is fully stopped, before setting needed flags and resuming it. The bug is that we were not testing *for what reason* tracee has stopped. On most arches, there is no other reason to block on relevant code paths, but on ia64, there is - some ugly rotating register window saving stuff which I thankfully know almost nothing about apart from that it can legitimately block (for a short time). When ia64 blocks there, tracer resumes it but ptrace flags get messed up, resulting in tracee *not* stopping on next syscall as was intended by PTRACE_SYSCALL, but free-running. This patch adds checks that tracee indeed not just stopped, but stopped for right reasons. If it did not (yet), yield and check again. Testing: Patched kernel 2.6.18-118.el5 with this patch, installed it on rx2620.gsslab.rdu.redhat.com and ran utrace-tests testsuite. No regressions, and clone-multi-ptrace test passes. Customer's original testcase also passes. Upstream: Not affected. utrace has significant changes since then, and the version Roland intends to eventually push to Linus is different. Still, since we have a testcase for this bug, if it will reoccur in new utrace, it will be caught. -- vda diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 256ac34..056da58 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -923,9 +923,21 @@ ptrace_start(long pid, long request, * (void) utrace_regset(child, engine, utrace_native_view(child), 0); */ wait_task_inactive(child); - - if (child->exit_state) - goto out_tsk; + while (child->state != TASK_TRACED && child->state != TASK_STOPPED) { + if (child->exit_state) + goto out_tsk; + /* + * This is a dismal kludge, but it only comes up on ia64. + * It might be blocked inside regset->writeback() called + * from ptrace_report(), when it's on its way to quiescing + * in TASK_TRACED real soon now. We actually need that + * writeback call to have finished, before a PTRACE_PEEKDATA + * here, for example. So keep waiting until it's really there. + */ + yield(); + wait_task_inactive(child); + } + wait_task_inactive(child); *childp = child; *enginep = engine; diff --git a/kernel/utrace.c b/kernel/utrace.c index 7eaff6a..5bc6e58 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -1117,9 +1117,15 @@ wake: * * On the exit path, it's only truly quiescent if it has * already been through utrace_report_death, or never will. + * + * If it's live, it's only really quiescent enough if it has + * actually got into TASK_TRACED. If it has UTRACE_ACTION_QUIESCE + * set but is still on the way and hasn't entered utrace_quiescent + * yet, let it get through its callbacks and bookkeeping. + * Otherwise we could break an assumption about getting through + * utrace_quiescent at least once before of setting QUIESCE. */ - if (unlikely(target->exit_state) - && unlikely(target->utrace_flags & DEATH_EVENTS)) + if (!quiesce(target, 0)) spin_unlock(&utrace->lock); else wake_quiescent(old_flags, utrace, target);