Sophie: kernel-2.6.18-238.el5 src

kernel-2.6.18-238.el5.src.rpm

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Thu, 23 Apr 2009 01:09:51 -0400
Subject: [md] race conditions in snapshots
Message-id: Pine.LNX.4.64.0904230106120.12945@hs20-bc2-1.build.redhat.com
O-Subject: [PATCH 1/2 RHEL5.4] bz496100 race conditions in snapshots
Bugzilla: 496100
RH-Acked-by: Jonathan Brassow <jbrassow@redhat.com>

Hi

The following two patches fix two race conditions in snapshots that were
found during source code review.

The first one is a bug in kcopyd and can result in simultaneous access to
the exception area by two processes and crashes.

Testing: created a two snapshots, stored data on them and on the origin
and tested data integrity. Also run a synthetic testcase that I created
for some earlier bug.

Mikulas

--

A combined patch for commits (2.6.30-rc2):
73830857bca6f6c9dbd48e906daea50bea42d676
340cd44451fb0bfa542365e6b4b565bbd44836e2

dm kcopyd: fix callback race

If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
split_job/segment_complete and the subjobs complete before the loop in
split_job completes, the kcopyd callback could be invoked from the
thread that called dm_kcopyd_copy instead of the kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also mean that exception store callbacks are called twice
for the same exception - a likely reason for crashes reported inside
pending_complete() / remove_exception().

This patch fixes two problems:

1. job->fn being called from the thread that submitted the job (see above).

- Fix: hand over the completion callback to the kcopyd thread.

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

- Fix: pass job->write_err to the callback to report all error bits
  (it is done already in run_complete_job)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

diff --git a/drivers/md/kcopyd.c b/drivers/md/kcopyd.c
index cdcc34b..30af8a0 100644
--- a/drivers/md/kcopyd.c
+++ b/drivers/md/kcopyd.c
@@ -291,7 +291,8 @@ static int run_complete_job(struct kcopyd_job *job)
 	kcopyd_notify_fn fn = job->fn;
 	struct kcopyd_client *kc = job->kc;
 
-	kcopyd_put_pages(kc, job->pages);
+	if (job->pages)
+		kcopyd_put_pages(kc, job->pages);
 	mempool_free(job, kc->job_pool);
 	fn(read_err, write_err, context);
 
@@ -454,6 +455,7 @@ static void segment_complete(int read_err,
 	sector_t progress = 0;
 	sector_t count = 0;
 	struct kcopyd_job *job = (struct kcopyd_job *) context;
+	struct kcopyd_client *kc = job->kc;
 
 	down(&job->lock);
 
@@ -483,7 +485,7 @@ static void segment_complete(int read_err,
 
 	if (count) {
 		int i;
-		struct kcopyd_job *sub_job = mempool_alloc(job->kc->job_pool,
+		struct kcopyd_job *sub_job = mempool_alloc(kc->job_pool,
 							   GFP_NOIO);
 
 		*sub_job = *job;
@@ -502,13 +504,16 @@ static void segment_complete(int read_err,
 	} else if (atomic_dec_and_test(&job->sub_jobs)) {
 
 		/*
-		 * To avoid a race we must keep the job around
-		 * until after the notify function has completed.
-		 * Otherwise the client may try and stop the job
-		 * after we've completed.
+		 * Queue the completion callback to the kcopyd thread.
+		 *
+		 * Some callers assume that all the completions are called
+		 * from a single thread and don't race with each other.
+		 *
+		 * We must not call the callback directly here because this
+		 * code may not be executing in the thread.
 		 */
-		job->fn(read_err, write_err, job->context);
-		mempool_free(job, job->kc->job_pool);
+		push(&kc->complete_jobs, job);
+		wake(kc);
 	}
 }
 
@@ -521,6 +526,8 @@ static void split_job(struct kcopyd_job *job)
 {
 	int i;
 
+	atomic_inc(&job->kc->nr_jobs);
+
 	atomic_set(&job->sub_jobs, SPLIT_COUNT);
 	for (i = 0; i < SPLIT_COUNT; i++)
 		segment_complete(0, 0u, job);