Sophie: kernel-2.6.18-238.el5 src

kernel-2.6.18-238.el5.src.rpm

From: Denys Vlasenko <dvlasenk@redhat.com>
Date: Tue, 21 Oct 2008 13:26:14 +0200
Subject: [ia64] fix ptrace hangs when following threads
Message-id: 1224588374.608.28.camel@localhost.localdomain
O-Subject: [RHEL5.3 PATCH] bz461456 ia64: The trace of some threads unexpectedly stops when being traced by 'strace -f'
Bugzilla: 461456
RH-Acked-by: Jerome Marchand <jmarchan@redhat.com>
RH-Acked-by: Prarit Bhargava <prarit@redhat.com>

Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=461456

Description:
When process is cloned with CLONE_PTRACE, it stops with SIGSTOP.
This involves some work, and this work is not fully finished
by the moment notification is delivered to the tracer.

If tracer issues ptrace(PTRACE_SYSCALL) in order to continue
the thread, kernel will first wait till tracee is fully stopped,
before setting needed flags and resuming it.

The bug is that we were not testing *for what reason* tracee
has stopped. On most arches, there is no other reason to block
on relevant code paths, but on ia64, there is - some ugly
rotating register window saving stuff which I thankfully
know almost nothing about apart from that it can legitimately
block (for a short time).

When ia64 blocks there, tracer resumes it but ptrace flags
get messed up, resulting in tracee *not* stopping on next syscall
as was intended by PTRACE_SYSCALL, but free-running.

This patch adds checks that tracee indeed not just stopped,
but stopped for right reasons. If it did not (yet),
yield and check again.

Testing:
Patched kernel 2.6.18-118.el5 with this patch, installed it on
rx2620.gsslab.rdu.redhat.com and ran utrace-tests testsuite.
No regressions, and clone-multi-ptrace test passes.
Customer's original testcase also passes.

Upstream:
Not affected. utrace has significant changes since then,
and the version Roland intends to eventually push to Linus
is different. Still, since we have a testcase for this bug,
if it will reoccur in new utrace, it will be caught.

--
vda

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 256ac34..056da58 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -923,9 +923,21 @@ ptrace_start(long pid, long request,
 	 *  (void) utrace_regset(child, engine, utrace_native_view(child), 0);
 	 */
 	wait_task_inactive(child);
-
-	if (child->exit_state)
-		goto out_tsk;
+	while (child->state != TASK_TRACED && child->state != TASK_STOPPED) {
+		if (child->exit_state)
+			goto out_tsk;
+		/*
+		 * This is a dismal kludge, but it only comes up on ia64.
+		 * It might be blocked inside regset->writeback() called
+		 * from ptrace_report(), when it's on its way to quiescing
+		 * in TASK_TRACED real soon now.  We actually need that
+		 * writeback call to have finished, before a PTRACE_PEEKDATA
+		 * here, for example.  So keep waiting until it's really there.
+		 */
+		yield();
+		wait_task_inactive(child);
+	}
+	wait_task_inactive(child);
 
 	*childp = child;
 	*enginep = engine;
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 7eaff6a..5bc6e58 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -1117,9 +1117,15 @@ wake:
 	 *
 	 * On the exit path, it's only truly quiescent if it has
 	 * already been through utrace_report_death, or never will.
+	 *
+	 * If it's live, it's only really quiescent enough if it has
+	 * actually got into TASK_TRACED.  If it has UTRACE_ACTION_QUIESCE
+	 * set but is still on the way and hasn't entered utrace_quiescent
+	 * yet, let it get through its callbacks and bookkeeping.
+	 * Otherwise we could break an assumption about getting through
+	 * utrace_quiescent at least once before of setting QUIESCE.
 	 */
-	if (unlikely(target->exit_state)
-	    && unlikely(target->utrace_flags & DEATH_EVENTS))
+	if (!quiesce(target, 0))
 		spin_unlock(&utrace->lock);
 	else
 		wake_quiescent(old_flags, utrace, target);