Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 340e01248478ba8b78a6d4d1809b1eff > files > 407

kvm-83-270.el5_11.src.rpm

From 5b358cf5f711fde7eb9d82c8ce2ccabd456b5b2c Mon Sep 17 00:00:00 2001
From fc5511c6c060162b032b31b5724971b71efadf66
From: Sheng Yang <sheng@linux.intel.com>
Date: Tue, 6 Jan 2009 02:03:03 +0800
Subject: [PATCH 7/9] KVM: Fix racy in kvm_free_assigned_irq

In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:

1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().

2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...

This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
RH-Type: bugfix
RH-Upstream-status: applied(kvm/master), pending(maint/2.6.29)
---
 arch/x86/kvm/x86.c  |    2 +-
 virt/kvm/kvm_main.c |   27 +++++++++++++++++++--------
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 953d024..558922c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4152,11 +4152,11 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_sync_events(struct kvm *kvm)
 {
+	kvm_free_all_assigned_devices(kvm);
 }
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	kvm_free_all_assigned_devices(kvm);
 	kvm_iommu_unmap_guest(kvm);
 	kvm_free_pit(kvm);
 	kfree(kvm->arch.vpic);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3ab6900..6762eaf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -173,7 +173,6 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work)
 		assigned_dev->host_irq_disabled = false;
 	}
 	mutex_unlock(&assigned_dev->kvm->lock);
-	kvm_put_kvm(assigned_dev->kvm);
 }
 
 static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
@@ -181,8 +180,6 @@ static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 	struct kvm_assigned_dev_kernel *assigned_dev =
 		(struct kvm_assigned_dev_kernel *) dev_id;
 
-	kvm_get_kvm(assigned_dev->kvm);
-
 	schedule_work(&assigned_dev->interrupt_work);
 
 	disable_irq_nosync(irq);
@@ -213,6 +210,7 @@ static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
 	}
 }
 
+/* The function implicit hold kvm->lock mutex due to cancel_work_sync() */
 static void kvm_free_assigned_irq(struct kvm *kvm,
 				  struct kvm_assigned_dev_kernel *assigned_dev)
 {
@@ -228,11 +226,24 @@ static void kvm_free_assigned_irq(struct kvm *kvm,
 	if (!assigned_dev->irq_requested_type)
 		return;
 
-	if (cancel_work_sync(&assigned_dev->interrupt_work))
-		/* We had pending work. That means we will have to take
-		 * care of kvm_put_kvm.
-		 */
-		kvm_put_kvm(kvm);
+	/*
+	 * In kvm_free_device_irq, cancel_work_sync return true if:
+	 * 1. work is scheduled, and then cancelled.
+	 * 2. work callback is executed.
+	 *
+	 * The first one ensured that the irq is disabled and no more events
+	 * would happen. But for the second one, the irq may be enabled (e.g.
+	 * for MSI). So we disable irq here to prevent further events.
+	 *
+	 * Notice this maybe result in nested disable if the interrupt type is
+	 * INTx, but it's OK for we are going to free it.
+	 *
+	 * If this function is a part of VM destroy, please ensure that till
+	 * now, the kvm state is still legal for probably we also have to wait
+	 * interrupt_work done.
+	 */
+	disable_irq_nosync(assigned_dev->host_irq);
+	cancel_work_sync(&assigned_dev->interrupt_work);
 
 	free_irq(assigned_dev->host_irq, (void *)assigned_dev);
 
-- 
1.6.1