From: Steve Dickson <SteveD@redhat.com> Date: Tue, 4 Nov 2008 14:04:53 -0500 Subject: [nfs] oops in direct I/O error handling Message-id: 49109CD5.1020504@RedHat.com O-Subject: Re: [PATCH][RHEL5.3] Oops in direct I/O error handling Bugzilla: 466164 RH-Acked-by: Jeff Moyer <jmoyer@redhat.com> RH-Acked-by: Jeff Layton <jlayton@redhat.com> Here is a late breaking patch from IBM that fixes a oops that was caused by the nfs_stress.sh script from the LTP test suite. It seems in nfs_direct_write_result() the error status is checked before the its set which causes the error to be missed. The following resolves this problem by checking the error status after its been set (what a novel concept! ;-) ) The IBM guys were able to reproduce this by running the nfs_stress.sh script and then having the server be come unresponsive. When the script timed out and started killing process the oops occurred. Also, this does match how upstream handles errors. The bz is: https://bugzilla.redhat.com/show_bug.cgi?id=466164 steved. diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 3ac2e7f..8dfe2b0 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -545,13 +545,14 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata) spin_lock(&dreq->lock); - if (unlikely(dreq->error != 0)) - goto out_unlock; - if (unlikely(status < 0)) { + if (unlikely(status < 0) || unlikely(task->tk_status < 0)) { /* An error has occured, so we should not commit */ dreq->flags = 0; - dreq->error = status; + /* Use the first error */ + dreq->error = ((status < 0) ? status : task->tk_status); } + if (unlikely(dreq->error != 0)) + goto out_unlock; dreq->count += data->res.count;