Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 931

kernel-2.6.18-238.el5.src.rpm

From: David Teigland <teigland@redhat.com>
Date: Wed, 25 Nov 2009 21:58:59 -0500
Subject: [fs] dlm: fix connection close handling
Message-id: <20091125215859.GB27274@redhat.com>
Patchwork-id: 21504
O-Subject: [RHEL5.5 PATCH] dlm: fix connection close handling
Bugzilla: 521093
RH-Acked-by: Robert S Peterson <rpeterso@redhat.com>
RH-Acked-by: Christine Caulfield <ccaulfie@redhat.com>

bz 521093

build: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2100377

upstream: in 2.6.32

Closing a connection to a node can create problems if there are
outstanding messages for that node.  The problems include dlm_send
spinning attempting to reconnect, or BUG from tcp_connect_to_sock()
attempting to use a partially closed connection.

To cleanly close a connection, we now first attempt to send any pending
messages, cancel any remaining workqueue work, and flag the connection
as closed to avoid reconnect attempts.

RHEL5 change: remove cancel_work_sync(&con->swork) and
cancel_work_sync(&con->rwork) since cancel_work_sync doesn't
exist in RHEL5.  The patch should work fine without them,
they weren't the main point of the patch.

Signed-off-by: David Teigland <teigland@redhat.com>

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 051709d..b802a0f 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -104,6 +104,7 @@ struct connection {
 #define CF_CONNECT_PENDING 3
 #define CF_INIT_PENDING 4
 #define CF_IS_OTHERCON 5
+#define CF_CLOSE 6
 	struct list_head writequeue;  /* List of outgoing writequeue_entries */
 	spinlock_t writequeue_lock;
 	int (*rx_action) (struct connection *);	/* What to do when active */
@@ -274,6 +275,8 @@ static void lowcomms_write_space(struct sock *sk)
 
 static inline void lowcomms_connect_sock(struct connection *con)
 {
+	if (test_bit(CF_CLOSE, &con->flags))
+		return;
 	if (!test_and_set_bit(CF_CONNECT_PENDING, &con->flags))
 		queue_work(send_workqueue, &con->swork);
 }
@@ -1355,6 +1358,9 @@ int dlm_lowcomms_close(int nodeid)
 	log_print("closing connection to node %d", nodeid);
 	con = nodeid2con(nodeid, 0);
 	if (con) {
+		clear_bit(CF_CONNECT_PENDING, &con->flags);
+		clear_bit(CF_WRITE_PENDING, &con->flags);
+		set_bit(CF_CLOSE, &con->flags);
 		clean_one_writequeue(con);
 		close_connection(con, 1);
 	}
@@ -1379,9 +1385,10 @@ static void process_send_sockets(void *data)
 	struct connection *con = data;
 	if (test_and_clear_bit(CF_CONNECT_PENDING, &con->flags)) {
 		con->connect_action(con);
+		set_bit(CF_WRITE_PENDING, &con->flags);
 	}
-	clear_bit(CF_WRITE_PENDING, &con->flags);
-	send_to_sock(con);
+	if (test_and_clear_bit(CF_WRITE_PENDING, &con->flags))
+		send_to_sock(con);
 }