Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 89877e42827f16fa5f86b1df0c2860b1 > files > 1884

kernel-2.6.18-128.1.10.el5.src.rpm

From: AMEET M. PARANJAPE <aparanja@redhat.com>
Date: Thu, 16 Oct 2008 16:53:51 -0500
Subject: [ppc64] SPUs hang when run with affinity-1
Message-id: 48F7B7EF.6070804@REDHAT.COM
O-Subject: Re: [PATCH RHEL5.3 BZ464686 1/2] Fix SPUs hangs when run with affinity
Bugzilla: 464686
RH-Acked-by: David Howells <dhowells@redhat.com>

RHBZ#
======
https://bugzilla.redhat.com/show_bug.cgi?id=464686

Description:
===========
The fix for this problem requires two patches the first of these patches is
described here:

It is possible to lock aff_mutex and cbe_spu_info[n].list_mutex in different
orders, allowing a deadlock to occur. With this change, aff_mutex is not
taken within a list_mutex critical section anymore.

The other patch will be attached to a post to follow.

RHEL Version Found:
================
RHEL 5.2

kABI Status:
============
No symbols were harmed.

Brew:
=====
Built on all platforms.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1508838

Upstream Status:
================
The patches were accepted upstream in kernel 2.6.27-rc1 (see
http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.27-rc1)

Test Status:
============
A testcase is provided in the Red Hat Bugzilla and without the patches the
dmabench stress test hangs and a reboot is required to run another
application on the Cell Synergistic Processing Elements (SPEs).

With these patches dmabench runs successfully and running other SPE programs
afterwards is possible again.
===============================================================

Ameet Paranjape 978-392-3903 ext 23903
IBM on-site partner

Proposed Patch:
===============

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c
index 24f4c43..9df2068 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -382,6 +382,9 @@ static int has_affinity(struct spu_context *ctx)
 	if (list_empty(&ctx->aff_list))
 		return 0;
 
+	if (atomic_read(&gang->aff_sched_count) == 0)
+		gang->aff_ref_spu = NULL;
+
 	if (!gang->aff_ref_spu) {
 		if (!(gang->aff_flags & AFF_MERGED))
 			aff_merge_remaining_ctxs(gang);
@@ -407,14 +410,13 @@ static void spu_unbind_context(struct spu *spu, struct spu_context *ctx)
  	if (spu->ctx->flags & SPU_CREATE_NOSCHED)
 		atomic_dec(&cbe_spu_info[spu->node].reserved_spus);
 
-	if (ctx->gang){
-		mutex_lock(&ctx->gang->aff_mutex);
-		if (has_affinity(ctx)) {
-			if (atomic_dec_and_test(&ctx->gang->aff_sched_count))
-				ctx->gang->aff_ref_spu = NULL;
-		}
-		mutex_unlock(&ctx->gang->aff_mutex);
-	}
+	if (ctx->gang)
+		/*
+		 * If ctx->gang->aff_sched_count is positive, SPU affinity is
+		 * being considered in this gang. Using atomic_dec_if_positive
+		 * allow us to skip an explicit check for affinity in this gang
+		 */
+		atomic_dec_if_positive(&ctx->gang->aff_sched_count);
 
 	spu_switch_notify(spu, NULL);
 	spu_unmap_mappings(ctx);
@@ -543,11 +545,7 @@ static struct spu *spu_get_idle(struct spu_context *ctx)
 				goto found;
 			mutex_unlock(&cbe_spu_info[node].list_mutex);
 
-			mutex_lock(&ctx->gang->aff_mutex);
-			if (atomic_dec_and_test(&ctx->gang->aff_sched_count))
-				ctx->gang->aff_ref_spu = NULL;
-			mutex_unlock(&ctx->gang->aff_mutex);
-
+			atomic_dec(&ctx->gang->aff_sched_count);
 			return NULL;
 		}
 		mutex_unlock(&ctx->gang->aff_mutex);