Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1807

kernel-2.6.18-238.el5.src.rpm

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Mon, 20 Oct 2008 16:41:50 -0400
Subject: [md] random memory corruption in snapshots
Message-id: Pine.LNX.4.64.0810201628140.24315@hs20-bc2-1.build.redhat.com
O-Subject: [RHEL 5.3] bz465825 random memory corruption in snapshots
Bugzilla: 465825
RH-Acked-by: Alasdair G Kergon <agk@redhat.com>

Hi

Here is the patch that fixes random double-free in snapshots.

The impact of the bug is serious --- when you have SMP or preemptive
kernel and when the user is simultaneously writing to both the origin and
snapshot device, double-free and memory corruption can occur.

Upstream: in Alasdair's patchset at
ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html
Will be submitted to Linus in the next batch.

Testing: I compiled current 5.3 kernel on my workstation and run snapshot
tests to make sure that snapshots work. Because all the reported bugs are
unreproducible, it is not possible to test that the patch fixes them.

Please apply the patch soon, we have already two unreproducible crashes
discovered in Red Hat regression testing and we don't know if they are
caused by this bug or not.

Mikulas

Fix a race condition with primary_pe ref_count handling.

put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.

__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.

This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.

CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
	origin_bios = bio_list_get(&primary_pe->origin_bios);
... decrements primary_pe->ref_count to 1. Doesn't load origin_bios

CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
	flush_bios(bio_list_get(&primary_pe->origin_bios));
	free_pending_exception(primary_pe);
	/* If we got here, pe_queue is necessarily empty. */
	return r;
}
... decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.

CPU1:
if (!primary_pe || primary_pe != pe)
	free_pending_exception(pe);
... this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
	free_pending_exception(primary_pe);
... sees ref_count == 0 (written by CPU 2), does double free !!

This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.

If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).

If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.

The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).

This race may be reason for bugs:
http://bugzilla.kernel.org/show_bug.cgi?id=11636
https://bugzilla.redhat.com/show_bug.cgi?id=465825
https://bugzilla.redhat.com/show_bug.cgi?id=466931

The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 52ea785..19b2c6c 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -826,8 +826,10 @@ static struct bio *put_pending_exception(struct pending_exception *pe)
 	 * the bios for the original write to the origin.
 	 */
 	if (primary_pe &&
-	    atomic_dec_and_test(&primary_pe->ref_count))
+	    atomic_dec_and_test(&primary_pe->ref_count)) {
 		origin_bios = bio_list_get(&primary_pe->origin_bios);
+		free_pending_exception(primary_pe);
+	}
 
 	/*
 	 * Free the pe if it's not linked to an origin write or if
@@ -836,12 +838,6 @@ static struct bio *put_pending_exception(struct pending_exception *pe)
 	if (!primary_pe || primary_pe != pe)
 		free_pending_exception(pe);
 
-	/*
-	 * Free the primary pe if nothing references it.
-	 */
-	if (primary_pe && !atomic_read(&primary_pe->ref_count))
-		free_pending_exception(primary_pe);
-
 	return origin_bios;
 }