Sophie: kernel-2.6.18-194.11.1.el5 src

kernel-2.6.18-194.11.1.el5.src.rpm

From: Jeff Layton <jlayton@redhat.com>
Date: Wed, 9 Jul 2008 09:23:30 -0400
Subject: [nfs] sunrpc: sleeping rpc_malloc might deadlock
Message-id: 1215609810-24372-1-git-send-email-jlayton@redhat.com
O-Subject: [RHEL5.3 PATCH] BZ#451317: sunrpc: prevent deadlock by not allowing rpc_malloc to sleep
Bugzilla: 451317
RH-Acked-by: Peter Staubach <staubach@redhat.com>

This problem was originally noticed by a fairly large customer of ours
with RHEL4, but it appears that RHEL5 has a similar problem.

When the RPC engine allocates a buffers, it uses an allocation that
can sleep to try and free memory. The problem is that if it ends
up trying to free dirty NFS pages to allocate a page, then it can
deadlock. rpciod may be stuck waiting to free up a page that cannot
be freed until rpciod does some work.

We don't want RPC's sleeping outside of the task scheduler while
allocating their buffer. Change rpc_malloc from using GFP_NOFS
to GFP_NOWAIT when allocating memory. In addition to making this
a non-blocking allocation, it allows the kernel to dip a little
more deeply into the reserves before giving up and returning
NULL to a page allocation.

Also, change the printk in call_allocate() to a dprintk since this
makes it somewhat more likely that it could fire.

Obviously, it's tough to reproduce this situation, but I did test
this with a fault injection patch that made rpc_malloc return NULL
on every 10th call. That seemed to have no ill effects other than
a slight slowdown.

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index da81c76..e36cc5d 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -729,7 +729,7 @@ call_allocate(struct rpc_task *task)
 
 	if (xprt->ops->buf_alloc(task, bufsiz << 1) != NULL)
 		return;
-	printk(KERN_INFO "RPC: buffer allocation failed for task %p\n", task); 
+	dprintk("RPC: buffer allocation failed for task %p\n", task); 
 
 	if (RPC_IS_ASYNC(task) || !signalled()) {
 		xprt_release(task);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 7e870ed..68f7e36 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -758,10 +758,16 @@ static void rpc_async_schedule(void *arg)
  * @task: RPC task that will use this buffer
  * @size: requested byte size
  *
- * We try to ensure that some NFS reads and writes can always proceed
- * by using a mempool when allocating 'small' buffers.
+ * To prevent rpciod from hanging, this allocator never sleeps,
+ * returning NULL if the request cannot be serviced immediately.
+ * The caller can arrange to sleep in a way that is safe for rpciod.
+ *
+ * Most requests are 'small' (under 2KiB) and can be serviced from a
+ * mempool, ensuring that NFS reads and writes can always proceed,
+ * and that there is good locality of reference for these buffers.
+ *
  * In order to avoid memory starvation triggering more writebacks of
- * NFS requests, we use GFP_NOFS rather than GFP_KERNEL.
+ * NFS requests, we avoid using GFP_KERNEL.
  */
 void * rpc_malloc(struct rpc_task *task, size_t size)
 {
@@ -771,7 +777,7 @@ void * rpc_malloc(struct rpc_task *task, size_t size)
 	if (task->tk_flags & RPC_TASK_SWAPPER)
 		gfp = GFP_ATOMIC;
 	else
-		gfp = GFP_NOFS;
+		gfp = GFP_NOWAIT;
 
 	if (size > RPC_BUFFER_MAXSIZE) {
 		req->rq_buffer = kmalloc(size, gfp);