From: Mikulas Patocka <mpatocka@redhat.com> Date: Tue, 19 Aug 2008 16:47:46 -0400 Subject: [md] dm snapshot: fix race during exception creation Message-id: Pine.LNX.4.64.0808191646160.3053@hs20-bc2-1.build.redhat.com O-Subject: [PATCH 2/2 RHEL 5.3] bz459337 Race condition and data corruption in dm-snapshots Bugzilla: 459337 RH-Acked-by: Alasdair G Kergon <agk@redhat.com> The second patch that depends on the first one and actually fixes the bug. testing: I compiled it on my workstation, run testcases from http://people.redhat.com/mpatocka/testcases/ Before both patches were applied, tescases testcase-read-snapshot-vs-write-origin-1.c and testcase-read-snapshot-vs-write-origin-2.c reported data corruption. After the patches were applied, the testcases pass ok. upstream commit: a8d41b59f3f5a7ac19452ef442a7fc1b5fa17366 (2.6.27-rc1) dm snapshot: fix race during exception creation Fix a race condition that returns incorrect data when a write causes an exception to be allocated whilst a read is still in flight. The race condition happens as follows: * A read to non-reallocated sector in the snapshot is submitted so that the read is routed to the original device. * A write to the original device is submitted. The write causes an exception that reallocates the block. The write proceeds. * The original read is dequeued and reads the wrong data. This race can be triggered with CFQ scheduler and one thread writing and multiple threads reading simultaneously. diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 2c360a1..959c9e5 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -133,6 +133,27 @@ static void stop_tracking_chunk(struct dm_snapshot *s, mempool_free(c, s->tracked_chunk_pool); } +static int __chunk_is_tracked(struct dm_snapshot *s, chunk_t chunk) +{ + struct dm_snap_tracked_chunk *c; + struct hlist_node *hn; + int found = 0; + + spin_lock_irq(&s->tracked_chunk_lock); + + hlist_for_each_entry(c, hn, + &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)], node) { + if (c->chunk == chunk) { + found = 1; + break; + } + } + + spin_unlock_irq(&s->tracked_chunk_lock); + + return found; +} + /* * One of these per registered origin, held in the snapshot_origins hash */ @@ -843,6 +864,13 @@ static void pending_complete(struct pending_exception *pe, int success) } /* + * Check for conflicting reads. This is extremely improbable, + * so yield() is sufficient and there is no need for a wait queue. + */ + while (__chunk_is_tracked(s, pe->e.old_chunk)) + yield(); + + /* * Add a proper exception, and remove the * in-flight exception from the list. */