Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 2197

kernel-2.6.18-238.el5.src.rpm

From: Larry Woodman <lwoodman@redhat.com>
Date: Mon, 5 Apr 2010 15:53:30 -0400
Subject: [mm] fix hugepage corruption using vm.drop_caches
Message-id: <1270482810.3551.304.camel@dhcp-100-19-198.bos.redhat.com>
Patchwork-id: 23887
O-Subject: [RHEL5 Patch] vm.drop_caches corrupts hugepages and causes Oracle
	Database ORA-600 crashes
Bugzilla: 579469
RH-Acked-by: Bob Picco <bpicco@redhat.com>
RH-Acked-by: Dean Nelson <dnelson@redhat.com>

While running an Oracle Database, single-instance or RAC with the SGA
backed by hugepages if you "echo 3 > /proc/sys/vm/drop_caches the system
silently corrupts the database hugepages.  This causes various ORA-600
errors by the Oracle database.

This problem has been fixed upstream with
commit 6649a3863232eb2e2f15ea6c622bd8ceacf96d76
---------------------------------------------------------------------------
Author: Ken Chen <kenchen@google.com>
Date:   Thu Feb 8 14:20:27 2007 -0800

    [PATCH] hugetlb: preserve hugetlb pte dirty state

__unmap_hugepage_range() is buggy that it does not preserve dirty state
of huge_pte when unmapping hugepage range.  It causes data corruption in
the event of drop_caches being used by sys admin.  For example, an
application creates a hugetlb file, modify pages, then unmap it.  While
leaving the hugetlb file alive, comes along sys admin doing a "echo 3 >
/proc/sys/vm/drop_caches".

drop_pagecache_sb() will happily free all pages that aren't marked dirty
if there are no active mapping.  Later when application remaps the
hugetlb file back and all data are gone, triggering catastrophic flip
over on application.

Not only that, the internal resv_huge_pages count will also get all
messed up.  Fix it up by marking page dirty appropriately.

    Signed-off-by: Ken Chen <kenchen@google.com>
    Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com>
    Cc: Adam Litke <agl@us.ibm.com>
    Cc: David Gibson <david@gibson.dropbear.id.au>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Cc: <stable@kernel.org>
    Cc: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-------------------------------------------------------------------------

The attached backport also fixes this problem in RHEL5-U5, BZ579469

Signed-off-by: Jarod Wilson <jarod@redhat.com>

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index f724806..c6d6ff3 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -450,9 +450,13 @@ static int hugetlbfs_symlink(struct inode *dir,
 
 /*
  * For direct-IO reads into hugetlb pages
+ * mark the head page dirty
  */
 static int hugetlbfs_set_page_dirty(struct page *page)
 {
+	struct page *head = compound_head(page);
+
+	SetPageDirty(head);
 	return 0;
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fa2fd01..a542f79 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -426,6 +426,8 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			continue;
 
 		page = pte_page(pte);
+		if (pte_dirty(pte))
+			set_page_dirty(page);
 		put_page(page);
 	}
 	spin_unlock(&mm->page_table_lock);