Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1174

kernel-2.6.18-238.el5.src.rpm

From: Alexander Viro <aviro@redhat.com>
Date: Mon, 21 Apr 2008 04:10:46 -0400
Subject: [fs] race condition in dnotify
Message-id: 20080421081046.GA15488@devserv.devel.redhat.com
O-Subject: Re: [kernel team] [patch][rhel-5.2] bz439759
Bugzilla: 443440 439759

	We have a problem...  Two, actually:

a) fcheck() needs rcu_read_lock() or current->files->file_lock.
Both in dnotify.c and locks.c.

b) locks.c (fcntl_setlk()) has subtler one - we need to make sure
that if we *do* have an fcntl/close race on SMP box, the access to
descriptor table and inode->i_flock won't get reordered.

As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides.  We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).

dnotify side doesn't have that problem, since inode->i_lock orders
things nicely.  So the solution is
	* rcu_read_lock() around that fcheck() in dnotify
	* ->files->file_lock around fcheck() in fcntl_setlk() and
fcntl_setlk64(), which solves (b) - STORE to ->i_flock has to
happen before unlock in fcntl_setlk(), LOAD from ->i_flock has
to happen after lock in sys_close(), so either we have lock in
sys_close() after unlock in fcntl_setlk() and LOAD will happen
after STORE (i.e. close() will evict) or we have fcheck() in
fcntl_setlk() seeing the effect of removal from descriptor
table and fcntl_setlk() evicting the sucker.

	I don't see how to separate these two, unfortunately ;-/
They are too damn similar and releasing dnotify fix alone with
fcheck() properly protected will point to setlk().

	setlk() is not as easy to hit as unpatched dnotify -
Peter had closed the bulk of that race back in 2005.  OTOH, it
_is_ easier to hit than dnotify-with-original-patch - for one
thing, this place is preemptible (dnotify analog is under
spinlock).  For another, reordering is a problem that doesn't
have analog in case of dnotify.  Hell knows... :-/

	Anyway, updated patch follows.  Upstream analog will need
the same change.

Acked-by: "David S. Miller" <davem@redhat.com>

diff --git a/fs/dnotify.c b/fs/dnotify.c
index f932591..c2122c9 100644
--- a/fs/dnotify.c
+++ b/fs/dnotify.c
@@ -20,6 +20,9 @@
 #include <linux/init.h>
 #include <linux/spinlock.h>
 #include <linux/slab.h>
+#ifndef __GENKSYMS__
+#include <linux/file.h>
+#endif
 
 int dir_notify_enable __read_mostly = 1;
 
@@ -66,6 +69,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	struct dnotify_struct **prev;
 	struct inode *inode;
 	fl_owner_t id = current->files;
+	struct file *f;
 	int error = 0;
 
 	if ((arg & ~DN_MULTISHOT) == 0) {
@@ -92,6 +96,15 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 		prev = &odn->dn_next;
 	}
 
+	rcu_read_lock();
+	f = fcheck(fd);
+	rcu_read_unlock();
+	/* we'd lost the race with close(), sod off silently */
+	/* note that inode->i_lock prevents reordering problems
+	 * between accesses to descriptor table and ->i_dnotify */
+	if (f != filp)
+		goto out_free;
+
 	error = f_setown(filp, current->pid, 0);
 	if (error)
 		goto out_free;
diff --git a/fs/locks.c b/fs/locks.c
index e5ab8fb..cc00d13 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1757,6 +1757,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
 	struct file_lock *file_lock = locks_alloc_lock();
 	struct flock flock;
 	struct inode *inode;
+	struct file *f;
 	int error;
 
 	if (file_lock == NULL)
@@ -1827,7 +1828,15 @@ again:
 	 * Attempt to detect a close/fcntl race and recover by
 	 * releasing the lock that was just acquired.
 	 */
-	if (!error && fcheck(fd) != filp && flock.l_type != F_UNLCK) {
+	/*
+	 * we need that spin_lock here - it prevents reordering between
+	 * update of inode->i_flock and check for it done in close().
+	 * rcu_read_lock() wouldn't do.
+	 */
+	spin_lock(&current->files->file_lock);
+	f = fcheck(fd);
+	spin_unlock(&current->files->file_lock);
+	if (!error && f != filp && flock.l_type != F_UNLCK) {
 		flock.l_type = F_UNLCK;
 		goto again;
 	}
@@ -1883,6 +1892,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
 	struct file_lock *file_lock = locks_alloc_lock();
 	struct flock64 flock;
 	struct inode *inode;
+	struct file *f;
 	int error;
 
 	if (file_lock == NULL)
@@ -1953,7 +1963,10 @@ again:
 	 * Attempt to detect a close/fcntl race and recover by
 	 * releasing the lock that was just acquired.
 	 */
-	if (!error && fcheck(fd) != filp && flock.l_type != F_UNLCK) {
+	spin_lock(&current->files->file_lock);
+	f = fcheck(fd);
+	spin_unlock(&current->files->file_lock);
+	if (!error && f != filp && flock.l_type != F_UNLCK) {
 		flock.l_type = F_UNLCK;
 		goto again;
 	}