Sophie: kernel-2.6.18-194.11.1.el5 src

kernel-2.6.18-194.11.1.el5.src.rpm

From: Steve Dickson <SteveD@redhat.com>
Date: Tue, 4 Nov 2008 14:04:53 -0500
Subject: [nfs] oops in direct I/O error handling
Message-id: 49109CD5.1020504@RedHat.com
O-Subject: Re: [PATCH][RHEL5.3] Oops in direct I/O error handling
Bugzilla: 466164
RH-Acked-by: Jeff Moyer <jmoyer@redhat.com>
RH-Acked-by: Jeff Layton <jlayton@redhat.com>

Here is a late breaking patch from IBM that fixes a oops
that was caused by the nfs_stress.sh script from the LTP
test suite.

It seems in nfs_direct_write_result() the error status is checked
before the its set which causes the error to be missed. The
following resolves this problem by checking the error status
after its been set (what a novel concept! ;-) )

The IBM guys were able to reproduce this by running the nfs_stress.sh
script and then having the server be come unresponsive. When the
script timed out and started killing process the oops occurred.

Also, this does match how upstream handles errors.

The bz is: https://bugzilla.redhat.com/show_bug.cgi?id=466164

steved.

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 3ac2e7f..8dfe2b0 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -545,13 +545,14 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
 
 	spin_lock(&dreq->lock);
 
-	if (unlikely(dreq->error != 0))
-		goto out_unlock;
-	if (unlikely(status < 0)) {
+	if (unlikely(status < 0) || unlikely(task->tk_status < 0)) {
 		/* An error has occured, so we should not commit */
 		dreq->flags = 0;
-		dreq->error = status;
+		/* Use the first error */
+		dreq->error = ((status < 0) ? status : task->tk_status);
 	}
+	if (unlikely(dreq->error != 0))
+		goto out_unlock;
 
 	dreq->count += data->res.count;