From: David Teigland <teigland@redhat.com> Date: Wed, 25 Nov 2009 21:58:59 -0500 Subject: [fs] dlm: fix connection close handling Message-id: <20091125215859.GB27274@redhat.com> Patchwork-id: 21504 O-Subject: [RHEL5.5 PATCH] dlm: fix connection close handling Bugzilla: 521093 RH-Acked-by: Robert S Peterson <rpeterso@redhat.com> RH-Acked-by: Christine Caulfield <ccaulfie@redhat.com> bz 521093 build: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2100377 upstream: in 2.6.32 Closing a connection to a node can create problems if there are outstanding messages for that node. The problems include dlm_send spinning attempting to reconnect, or BUG from tcp_connect_to_sock() attempting to use a partially closed connection. To cleanly close a connection, we now first attempt to send any pending messages, cancel any remaining workqueue work, and flag the connection as closed to avoid reconnect attempts. RHEL5 change: remove cancel_work_sync(&con->swork) and cancel_work_sync(&con->rwork) since cancel_work_sync doesn't exist in RHEL5. The patch should work fine without them, they weren't the main point of the patch. Signed-off-by: David Teigland <teigland@redhat.com> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 051709d..b802a0f 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -104,6 +104,7 @@ struct connection { #define CF_CONNECT_PENDING 3 #define CF_INIT_PENDING 4 #define CF_IS_OTHERCON 5 +#define CF_CLOSE 6 struct list_head writequeue; /* List of outgoing writequeue_entries */ spinlock_t writequeue_lock; int (*rx_action) (struct connection *); /* What to do when active */ @@ -274,6 +275,8 @@ static void lowcomms_write_space(struct sock *sk) static inline void lowcomms_connect_sock(struct connection *con) { + if (test_bit(CF_CLOSE, &con->flags)) + return; if (!test_and_set_bit(CF_CONNECT_PENDING, &con->flags)) queue_work(send_workqueue, &con->swork); } @@ -1355,6 +1358,9 @@ int dlm_lowcomms_close(int nodeid) log_print("closing connection to node %d", nodeid); con = nodeid2con(nodeid, 0); if (con) { + clear_bit(CF_CONNECT_PENDING, &con->flags); + clear_bit(CF_WRITE_PENDING, &con->flags); + set_bit(CF_CLOSE, &con->flags); clean_one_writequeue(con); close_connection(con, 1); } @@ -1379,9 +1385,10 @@ static void process_send_sockets(void *data) struct connection *con = data; if (test_and_clear_bit(CF_CONNECT_PENDING, &con->flags)) { con->connect_action(con); + set_bit(CF_WRITE_PENDING, &con->flags); } - clear_bit(CF_WRITE_PENDING, &con->flags); - send_to_sock(con); + if (test_and_clear_bit(CF_WRITE_PENDING, &con->flags)) + send_to_sock(con); }