Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 2280

kernel-2.6.18-238.el5.src.rpm

From: Larry Woodman <lwoodman@redhat.com>
Date: Wed, 9 Jun 2010 18:24:33 -0400
Subject: [mm] properly release all hugepages on database shutdown
Message-id: <1276107873.8736.63.camel@dhcp-100-19-198.bos.redhat.com>
Patchwork-id: 26065
O-Subject: [RHEL5 Patch] Not all hugepages are released on database shutdown
Bugzilla: 593131
RH-Acked-by: Jarod Wilson <jarod@redhat.com>
RH-Acked-by: Rik van Riel <riel@redhat.com>
RH-Acked-by: Danny Feng <dfeng@redhat.com>

RHEL5 leaks a PUD entry's worth of huge pages(512 2MB pages_) when
shutting down Oracle when hugepages are used and pagetable sharing is in
use.  The problem is RHEL5 is missing an upstream commit
32b154c0b0bae2879bf4e549d861caf1759a3546:

---------------------------------------------------------------------
On x86 and x86-64, it is possible that page tables are shared beween
shared mappings backed by hugetlbfs. As part of this,
page_table_shareable() checks a pair of vma->vm_flags and they must
match  if they are to be shared. All VMA flags are taken into account,
including VM_LOCKED.

The problem is that VM_LOCKED is cleared on fork(). When a process with
a shared memory segment forks() to exec() a helper, there will be shared
VMAs with different flags. The impact is that the shared segment is
sometimes considered shareable and other times not, depending on what
process is checking.

What happens is that the segment page tables are being shared but the
count is inaccurate depending on the ordering of events. As the page
tables are freed with put_page(), bad pmd's are found when some of the
children exit. The hugepage counters also get corrupted and the Total
and Free count will no longer match even when all the hugepage-backed
regions are freed. This requires a reboot of the machine to "fix".

This patch addresses the problem by comparing all flags except
VM_LOCKED when deciding if pagetables should be shared or not for
hugetlbfs-backed mapping.

Signed-off-by: Mel Gorman <mel [at] csn>
Acked-by: Hugh Dickins <hugh.dickins [at] tiscali>
Cc: Ingo Molnar <mingo [at] elte>
Cc: Lee Schermerhorn <Lee.Schermerhorn [at] hp>
Cc: KOSAKI Motohiro <kosaki.motohiro [at] jp>
Cc: <starlight [at] binnacle>
Cc: Eric B Munson <ebmunson [at] us>
Cc: Adam Litke <agl [at] us>
Cc: Andy Whitcroft <apw [at] canonical>
Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
Signed-off-by: Greg Kroah-Hartman <gregkh [at] suse>
--------------------------------------------------------------------

The attached RHEL5 patch fixes this problem & BZ593131

Signed-off-by: Jarod Wilson <jarod@redhat.com>

diff --git a/arch/i386/mm/hugetlbpage.c b/arch/i386/mm/hugetlbpage.c
index 34728e4..aa0df08 100644
--- a/arch/i386/mm/hugetlbpage.c
+++ b/arch/i386/mm/hugetlbpage.c
@@ -26,12 +26,15 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
 	unsigned long sbase = saddr & PUD_MASK;
 	unsigned long s_end = sbase + PUD_SIZE;
 
+	/* allow segments to share if only one is marked locked */
+	unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
+	unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED;
 	/*
 	 * match the virtual addresses, permission and the alignment of the
 	 * page table page.
 	 */
 	if (pmd_index(addr) != pmd_index(saddr) ||
-	    vma->vm_flags != svma->vm_flags ||
+	    vm_flags != svm_flags ||
 	    sbase < svma->vm_start || svma->vm_end < s_end)
 		return 0;