Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 476

kernel-2.6.18-238.el5.src.rpm

Date: Thu, 02 Nov 2006 12:20:22 -0500
From: Kei Tokunaga <ktokunag@redhat.com>
Subject: [RHEL5 PATCH] ACPI based CPU hotplug doesn't work after trying to
 BSP offline

BZ213324
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=213324

Offline operation of any cpus hangups once an offline
operation of BSP is run.

The offline operation of BSP fails, which is a expected
behavior today.  The operation acquires workqueue_mutex,
but never releases it, which causes the hangups.

The correct sequence of _cpu_down() is like follow from
the operation of workqueue_mutex point of view.

  1) _cpu_down() calls workqueue_cpu_callback() via
     blocking_notifier_call_chain(CPU_DOWN_PREPARE) and
     acquires workqueue_mutex in the function.

  2) If the operation completes successfully, _cpu_down()
     calls workqueue_cpu_callback() via 
     blocking_notifier_call_chain(CPU_DEAD)
     and releases the workqueue_mutex.

     If the operation fails, _cpu_down() calls workqueue_cpu_callback()
     via blocking_notifier_call_chain(CPU_DOWN_FAILED) and
     releases the workqueue_mutex.

The failure case, however, doesn't work that way.  _cpu_down()
doesn't call blocking_notifier_call_chain() today, that
is the workqueue_mutex is never released.  The patch fixes that.

I verified that the patch works all right on -2.6.18-1.2740.el5
on my box.

The patch in upstream from 2.6.19-rc4.

Thanks,
Kei

---

 linux-2.6.18-1.2740.el5-kei/kernel/cpu.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff -puN kernel/cpu.c~bz213324-fix-cpuhp kernel/cpu.c
--- linux-2.6.18-1.2740.el5/kernel/cpu.c~bz213324-fix-cpuhp	2006-11-02 11:40:10.000000000 -0500
+++ linux-2.6.18-1.2740.el5-kei/kernel/cpu.c	2006-11-02 11:40:10.000000000 -0500
@@ -144,18 +144,18 @@ static int _cpu_down(unsigned int cpu)
 	p = __stop_machine_run(take_cpu_down, NULL, cpu);
 	mutex_unlock(&cpu_bitmask_lock);
 
-	if (IS_ERR(p)) {
+	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		if (blocking_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED,
 				(void *)(long)cpu) == NOTIFY_BAD)
 			BUG();
 
-		err = PTR_ERR(p);
-		goto out_allowed;
-	}
-
-	if (cpu_online(cpu))
+		if (IS_ERR(p)) {
+			err = PTR_ERR(p);
+			goto out_allowed;
+		}
 		goto out_thread;
+	}
 
 	/* Wait for it to sleep (leaving idle task). */
 	while (!idle_cpu(cpu))

_