From: Peter Staubach <staubach@redhat.com> Date: Thu, 4 Dec 2008 11:13:55 -0500 Subject: [nfs] lockd: handle long grace periods correctly Message-id: 493801C3.20101@redhat.com O-Subject: Re: [RHEL-5.4 PATCH] lockd: return NLM_LCK_DENIED_GRACE_PERIOD after long periods Bugzilla: 474590 RH-Acked-by: Jeff Layton <jlayton@redhat.com> RH-Acked-by: Steve Dickson <SteveD@redhat.com> Peter Staubach wrote: > Hi. > > Attached is a patch to address bz474590, "lockd: return > NLM_LCK_DENIED_GRACE_PERIOD after long periods". > > The problem is that the NFS server lock manager uses a grace > period after it comes up to allow clients to reacquire locks > that they were holding when the server went down. The server > handles the grace period by calculating when it should end by > adding a computed number of jiffies to the current value of > jiffies. When the value of jiffies exceeds the computed > value, then the grace period is considered to be completed. > > The problem is that jiffies can wrap and in a fairly short > period time, namely a few weeks. This can lead to the lock > manager assuming that it is once again, back in the grace > period, thus denying new lock requests. > > The solution is to set a flag indicating that the server is > in the grace period and to use a timer to clear the flag > when the grace period should be terminated. > > The upstream solution is essentially this, but using a > bunch of things that RHEL-5 does not have. This solution > matches the RHEL-4 solution previously implemented. > > Thanx... > > ps > Still moving too fast... This time, with the patch too... ps diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 928721f..be186f4 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -78,6 +78,8 @@ static const int nlm_port_min = 0, nlm_port_max = 65535; static struct ctl_table_header * nlm_sysctl_table; +static struct timer_list nlm_grace_period_timer; + static unsigned long set_grace_period(void) { unsigned long grace_period; @@ -92,7 +94,7 @@ static unsigned long set_grace_period(void) return grace_period + jiffies; } -static inline void clear_grace_period(void) +static inline void clear_grace_period(unsigned long not_used) { nlmsvc_grace_period = 0; } @@ -138,6 +140,12 @@ lockd(struct svc_rqst *rqstp) grace_period_expire = set_grace_period(); + init_timer(&nlm_grace_period_timer); + nlm_grace_period_timer.function = clear_grace_period; + nlm_grace_period_timer.expires = grace_period_expire; + + add_timer(&nlm_grace_period_timer); + /* * The main request loop. We don't terminate until the last * NFS mount or NFS daemon has gone away, and we've been sent a @@ -151,6 +159,8 @@ lockd(struct svc_rqst *rqstp) if (nlmsvc_ops) { nlmsvc_invalidate_all(); grace_period_expire = set_grace_period(); + mod_timer(&nlm_grace_period_timer, + grace_period_expire); } } @@ -160,10 +170,8 @@ lockd(struct svc_rqst *rqstp) * (Theoretically, there shouldn't even be blocked locks * during grace period). */ - if (!nlmsvc_grace_period) { + if (!nlmsvc_grace_period) timeout = nlmsvc_retry_blocked(); - } else if (time_before(grace_period_expire, jiffies)) - clear_grace_period(); /* * Find a socket with data available and call its @@ -188,6 +196,8 @@ lockd(struct svc_rqst *rqstp) flush_signals(current); + del_timer(&nlm_grace_period_timer); + /* * Check whether there's a new lockd process before * shutting down the hosts and clearing the slot.