From: Don Dutile <ddutile@redhat.com> Date: Thu, 11 Dec 2008 13:37:58 -0500 Subject: [xen] pv_hvm: guest hang on FV save/restore Message-id: 49415E06.2060003@redhat.com O-Subject: [RHEL5.3 PATCH] Guest hang on FV save/restore Bugzilla: 475778 RH-Acked-by: Bill Burns <bburns@redhat.com> RH-Acked-by: Chris Lalancette <clalance@redhat.com> RH-Acked-by: Rik van Riel <riel@redhat.com> BZ 475778 While Chris Lalancette was testing his latest patches for block-attach/detach as well as reboot, he found that a FV save/restore would hang the restored guest when the guest was configured w/vcpus=8 (on an 8 cpu system). The failure is due to a livelock during secondary cpu shutdowns. Fix is from upstream xen-unstable, cset 18669. Chris tested an x86_64 kernel-2.6.126 patched FV guest with the fix, and what once failed right away worked for over 442 save/restore iterations (with background kernel make -j 10). Please review & ack. Note: This is a regression since a rhel5.2 FV guest on 8vcpu didn't hang on restore. - Don diff --git a/drivers/xenpv_hvm/platform-pci/machine_reboot.c b/drivers/xenpv_hvm/platform-pci/machine_reboot.c index 34cb488..0392c61 100644 --- a/drivers/xenpv_hvm/platform-pci/machine_reboot.c +++ b/drivers/xenpv_hvm/platform-pci/machine_reboot.c @@ -10,12 +10,6 @@ struct ap_suspend_info { atomic_t nr_spinning; }; -/* - * Use a rwlock to protect the hypercall page from being executed in AP context - * while the BSP is re-initializing it after restore. - */ -static DEFINE_RWLOCK(suspend_lock); - #ifdef CONFIG_SMP /* * Spinning prevents, for example, APs touching grant table entries while @@ -31,12 +25,8 @@ static void ap_suspend(void *_info) atomic_inc(&info->nr_spinning); mb(); - while (info->do_spin) { + while (info->do_spin) cpu_relax(); - read_lock(&suspend_lock); - HYPERVISOR_yield(); - read_unlock(&suspend_lock); - } mb(); atomic_dec(&info->nr_spinning); @@ -52,9 +42,7 @@ static int bp_suspend(void) suspend_cancelled = HYPERVISOR_shutdown(SHUTDOWN_suspend); if (!suspend_cancelled) { - write_lock(&suspend_lock); platform_pci_resume(); - write_unlock(&suspend_lock); gnttab_resume(); irq_resume(); }