Sophie: kernel-2.6.18-238.el5 src

kernel-2.6.18-238.el5.src.rpm

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 19 Aug 2008 16:47:46 -0400
Subject: [md] dm snapshot: fix race during exception creation
Message-id: Pine.LNX.4.64.0808191646160.3053@hs20-bc2-1.build.redhat.com
O-Subject: [PATCH 2/2 RHEL 5.3] bz459337 Race condition and data corruption in dm-snapshots
Bugzilla: 459337
RH-Acked-by: Alasdair G Kergon <agk@redhat.com>

The second patch that depends on the first one and actually fixes the bug.

testing: I compiled it on my workstation, run testcases from
http://people.redhat.com/mpatocka/testcases/
Before both patches were applied, tescases
testcase-read-snapshot-vs-write-origin-1.c and
testcase-read-snapshot-vs-write-origin-2.c reported data corruption. After
the patches were applied, the testcases pass ok.

upstream commit: a8d41b59f3f5a7ac19452ef442a7fc1b5fa17366 (2.6.27-rc1)

dm snapshot: fix race during exception creation

Fix a race condition that returns incorrect data when a write causes an
exception to be allocated whilst a read is still in flight.

The race condition happens as follows:
* A read to non-reallocated sector in the snapshot is submitted so that the
  read is routed to the original device.
* A write to the original device is submitted. The write causes an exception
  that reallocates the block.  The write proceeds.
* The original read is dequeued and reads the wrong data.

This race can be triggered with CFQ scheduler and one thread writing and
multiple threads reading simultaneously.

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 2c360a1..959c9e5 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -133,6 +133,27 @@ static void stop_tracking_chunk(struct dm_snapshot *s,
 	mempool_free(c, s->tracked_chunk_pool);
 }
 
+static int __chunk_is_tracked(struct dm_snapshot *s, chunk_t chunk)
+{
+	struct dm_snap_tracked_chunk *c;
+	struct hlist_node *hn;
+	int found = 0;
+
+	spin_lock_irq(&s->tracked_chunk_lock);
+
+	hlist_for_each_entry(c, hn,
+	    &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)], node) {
+		if (c->chunk == chunk) {
+			found = 1;
+			break;
+		}
+	}
+
+	spin_unlock_irq(&s->tracked_chunk_lock);
+
+	return found;
+}
+
 /*
  * One of these per registered origin, held in the snapshot_origins hash
  */
@@ -843,6 +864,13 @@ static void pending_complete(struct pending_exception *pe, int success)
 	}
 
 	/*
+	 * Check for conflicting reads. This is extremely improbable,
+	 * so yield() is sufficient and there is no need for a wait queue.
+	 */
+	while (__chunk_is_tracked(s, pe->e.old_chunk))
+		yield();
+
+	/*
 	 * Add a proper exception, and remove the
 	 * in-flight exception from the list.
 	 */