From: Alexander Viro <aviro@redhat.com> Date: Mon, 21 Apr 2008 04:10:46 -0400 Subject: [fs] race condition in dnotify Message-id: 20080421081046.GA15488@devserv.devel.redhat.com O-Subject: Re: [kernel team] [patch][rhel-5.2] bz439759 Bugzilla: 443440 439759 We have a problem... Two, actually: a) fcheck() needs rcu_read_lock() or current->files->file_lock. Both in dnotify.c and locks.c. b) locks.c (fcntl_setlk()) has subtler one - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). dnotify side doesn't have that problem, since inode->i_lock orders things nicely. So the solution is * rcu_read_lock() around that fcheck() in dnotify * ->files->file_lock around fcheck() in fcntl_setlk() and fcntl_setlk64(), which solves (b) - STORE to ->i_flock has to happen before unlock in fcntl_setlk(), LOAD from ->i_flock has to happen after lock in sys_close(), so either we have lock in sys_close() after unlock in fcntl_setlk() and LOAD will happen after STORE (i.e. close() will evict) or we have fcheck() in fcntl_setlk() seeing the effect of removal from descriptor table and fcntl_setlk() evicting the sucker. I don't see how to separate these two, unfortunately ;-/ They are too damn similar and releasing dnotify fix alone with fcheck() properly protected will point to setlk(). setlk() is not as easy to hit as unpatched dnotify - Peter had closed the bulk of that race back in 2005. OTOH, it _is_ easier to hit than dnotify-with-original-patch - for one thing, this place is preemptible (dnotify analog is under spinlock). For another, reordering is a problem that doesn't have analog in case of dnotify. Hell knows... :-/ Anyway, updated patch follows. Upstream analog will need the same change. Acked-by: "David S. Miller" <davem@redhat.com> diff --git a/fs/dnotify.c b/fs/dnotify.c index f932591..c2122c9 100644 --- a/fs/dnotify.c +++ b/fs/dnotify.c @@ -20,6 +20,9 @@ #include <linux/init.h> #include <linux/spinlock.h> #include <linux/slab.h> +#ifndef __GENKSYMS__ +#include <linux/file.h> +#endif int dir_notify_enable __read_mostly = 1; @@ -66,6 +69,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) struct dnotify_struct **prev; struct inode *inode; fl_owner_t id = current->files; + struct file *f; int error = 0; if ((arg & ~DN_MULTISHOT) == 0) { @@ -92,6 +96,15 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) prev = &odn->dn_next; } + rcu_read_lock(); + f = fcheck(fd); + rcu_read_unlock(); + /* we'd lost the race with close(), sod off silently */ + /* note that inode->i_lock prevents reordering problems + * between accesses to descriptor table and ->i_dnotify */ + if (f != filp) + goto out_free; + error = f_setown(filp, current->pid, 0); if (error) goto out_free; diff --git a/fs/locks.c b/fs/locks.c index e5ab8fb..cc00d13 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1757,6 +1757,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, struct file_lock *file_lock = locks_alloc_lock(); struct flock flock; struct inode *inode; + struct file *f; int error; if (file_lock == NULL) @@ -1827,7 +1828,15 @@ again: * Attempt to detect a close/fcntl race and recover by * releasing the lock that was just acquired. */ - if (!error && fcheck(fd) != filp && flock.l_type != F_UNLCK) { + /* + * we need that spin_lock here - it prevents reordering between + * update of inode->i_flock and check for it done in close(). + * rcu_read_lock() wouldn't do. + */ + spin_lock(¤t->files->file_lock); + f = fcheck(fd); + spin_unlock(¤t->files->file_lock); + if (!error && f != filp && flock.l_type != F_UNLCK) { flock.l_type = F_UNLCK; goto again; } @@ -1883,6 +1892,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, struct file_lock *file_lock = locks_alloc_lock(); struct flock64 flock; struct inode *inode; + struct file *f; int error; if (file_lock == NULL) @@ -1953,7 +1963,10 @@ again: * Attempt to detect a close/fcntl race and recover by * releasing the lock that was just acquired. */ - if (!error && fcheck(fd) != filp && flock.l_type != F_UNLCK) { + spin_lock(¤t->files->file_lock); + f = fcheck(fd); + spin_unlock(¤t->files->file_lock); + if (!error && f != filp && flock.l_type != F_UNLCK) { flock.l_type = F_UNLCK; goto again; }