From: AMEET M. PARANJAPE <aparanja@redhat.com> Date: Wed, 8 Oct 2008 12:10:52 -0400 Subject: [ppc64] fix race for a free SPU Message-id: 20081008161052.31420.16764.sendpatchset@squad5-lp1.lab.bos.redhat.com O-Subject: [PATCH RHEL5.3 BZ465581] Fix race for a free SPU Bugzilla: 465581 RH-Acked-by: David Howells <dhowells@redhat.com> RHBZ# ====== https://bugzilla.redhat.com/show_bug.cgi?id=465581 Description: =========== There is currently a race for a free Synergistic Processing Element (SPE) where one thread is doing a spu_yield() and another doing a spu_activate(). This change introduces a 'free_spu' flag to spu_unschedule, to indicate whether or not the function should free the spu after descheduling the context. We only set this flag if we're not going to re-schedule another context on this SPU. RHEL Version Found: ================ RHEL 5.2 kABI Status: ============ No symbols were harmed. Brew: ===== Built on all platforms. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1507204 Upstream Status: ================ This patch was accepted upstream in kernel 2.6.27-rc6: http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.27-rc6 Test Status: ============ Without this patch the system locks up and shows and oops message on shutdown. A testcase is provided in the Red Hat Bugzilla that causes this behavior in less than one hour. With the patch applied the testcase is started and the system runs normally without any of the hangs. This test was run for over 24 hours. =============================================================== Ameet Paranjape 978-392-3903 ext 23903 IBM on-site partner Proposed Patch: =============== diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index 53ca21b..864930a 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -713,13 +713,28 @@ static void spu_schedule(struct spu *spu, struct spu_context *ctx) spu_release(ctx); } -static void spu_unschedule(struct spu *spu, struct spu_context *ctx) +/** + * spu_unschedule - remove a context from a spu, and possibly release it. + * @spu: The SPU to unschedule from + * @ctx: The context currently scheduled on the SPU + * @free_spu Whether to free the SPU for other contexts + * + * Unbinds the context @ctx from the SPU @spu. If @free_spu is non-zero, the + * SPU is made available for other contexts (ie, may be returned by + * spu_get_idle). If this is zero, the caller is expected to schedule another + * context to this spu. + * + * Should be called with ctx->state_mutex held. + */ +static void spu_unschedule(struct spu *spu, struct spu_context *ctx, + int free_spu) { int node = spu->node; mutex_lock(&cbe_spu_info[node].list_mutex); cbe_spu_info[node].nr_active--; - spu->alloc_state = SPU_FREE; + if (free_spu) + spu->alloc_state = SPU_FREE; spu_unbind_context(spu, ctx); ctx->stats.invol_ctx_switch++; spu->stats.invol_ctx_switch++; @@ -819,7 +834,7 @@ static int __spu_deactivate(struct spu_context *ctx, int force, int max_prio) if (spu) { new = grab_runnable_context(max_prio, spu->node); if (new || force) { - spu_unschedule(spu, ctx); + spu_unschedule(spu, ctx, new == NULL); if (new) { if (new->flags & SPU_CREATE_NOSCHED) wake_up(&new->stop_wq); @@ -887,7 +902,7 @@ static noinline void spusched_tick(struct spu_context *ctx) spu = ctx->spu; new = grab_runnable_context(ctx->prio + 1, spu->node); if (new) { - spu_unschedule(spu, ctx); + spu_unschedule(spu, ctx, 0); if (test_bit(SPU_SCHED_SPU_RUN, &ctx->sched_flags)) spu_add_to_rq(ctx); } else {