Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 89877e42827f16fa5f86b1df0c2860b1 > files > 1236

kernel-2.6.18-128.1.10.el5.src.rpm

From: Jeff Moyer <jmoyer@redhat.com>
Date: Thu, 3 Jul 2008 11:56:28 -0400
Subject: [mm] dio: fix cache invalidation after sync writes
Message-id: x49r6abm05v.fsf@segfault.boston.devel.redhat.com
O-Subject: [rhel5 patch] dio: fix cache invalidation after sync writes
Bugzilla: 445674
RH-Acked-by: Jeff Layton <jlayton@redhat.com>

This patch addresses bugzilla 445674.  It is a direct backport of the
following changeset:

commit bdb76ef5a4bc8676a81034a443f1eda450b4babb
Author: Zach Brown <zach.brown@oracle.com>
Date:   Tue Oct 30 11:45:46 2007 -0700

    dio: fix cache invalidation after sync writes

    Commit commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate
    clean pages before dio write") introduced a bug which stopped dio from
    ever invalidating the page cache after writes.  It still invalidated it
    before writes so most users were fine.

    Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
    this bug when he had a buffered reader immediately reading file data
    after an O_DIRECT wirter had written the data.  The kernel issued
    read-ahead beyond the position of the reader which overlapped with the
    O_DIRECT writer.  The failure to invalidate after writes caused the
    reader to see stale data from the read-ahead.

    The following patch is originally from Karl.  The following commentary
    is his:

    	The below 3rd try takes on your suggestion of just invalidating
    	no matter what the retval from the direct_IO call.  I ran it
    	thru the test-case several times and it has worked every time.
    	The post-invalidate is probably still too early for async-directio,
    	but I don't have a testcase for that;  just sync.  And, this
    	won't be any worse in the async case.

    I added a test to the aio-dio-regress repository which mimics Karl's IO
    pattern.  It verifed the bad behaviour and that the patch fixed it.  I
    agree with Karl, this still doesn't help the case where a buffered
    reader follows an AIO O_DIRECT writer.  That will require a bit more
    work.

    This gives up on the idea of returning EIO to indicate to userspace that
    stale data remains if the invalidation failed.

    Signed-off-by: Zach Brown <zach.brown@oracle.com>
    Cc: Karl Schendel <kschendel@datallegro.com>
    Cc: Benjamin LaHaise <bcrl@kvack.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
    Cc: Chris Mason <chris.mason@oracle.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

I am currently working on a fix for the AIO case, but that is outside
the scope of this bugzilla.  Since this is hitting customers today, I
need to address this and I'll open a separate bugzilla for the async
case.

As mentioned in the changelog, there is a test case that reproduces this
problem.  I've built a RHEL 5 kernel with this patch applied and it
shows that the problem has been resolved.

Comments welcome.

Cheers,

Jeff

diff --git a/mm/filemap.c b/mm/filemap.c
index 6605ba7..6557fbe 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2649,21 +2649,17 @@ generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	}
 
 	retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs);
-	if (retval)
-		goto out;
 
 	/* 
 	 * Finally, try again to invalidate clean pages which might have been
-	 * faulted in by get_user_pages() if the source of the write was an
-	 * mmap()ed region of the file we're writing.  That's a pretty crazy
-	 * thing to do, so we don't support it 100%.  If this invalidation
-	 * fails and we have -EIOCBQUEUED we ignore the failure.
+	 * cached by non-direct readahead, or faulted in by get_user_pages()
+	 * if the source of the write was an mmap'ed region of the file
+	 * we're writing.  Either one is a pretty crazy thing to do,
+	 * so we don't support it 100%.  If this invalidation
+	 * fails, tough, the write still worked...
 	 */
 	if (rw == WRITE && mapping->nrpages) {
-		int err = invalidate_inode_pages2_range(mapping,
-					      offset >> PAGE_CACHE_SHIFT, end);
-		if (err && retval >= 0)
-			retval = err;
+		invalidate_inode_pages2_range(mapping, offset >> PAGE_CACHE_SHIFT, end);
 	}
 out:
 	return retval;