Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1516

kernel-2.6.18-238.el5.src.rpm

From: Takao Indoh <tindoh@redhat.com>
Date: Thu, 28 Aug 2008 12:37:50 -0400
Subject: [ia64] procfs: reduce the size of page table cache
Message-id: 20080828163112.10505.31459.sendpatchset@dhcp-100-3-208.bos.redhat.com
O-Subject: [RHEL5.3 PATCH 0/2] show the size of page table cache in procfs
Bugzilla: 458410

This patch fixes bz458410.
https://bugzilla.redhat.com/show_bug.cgi?id=458410

[Background]
IA64 kernel consumes so much memory for per-cpu page table cache
that system free memory is reduced by considerable size. In the
worst case, about 50% of free memory can be used by the page
table cache on a 16-CPU node system. It can cause OOM killer.
Customers would wrongly suppose a kernel memory-leak is happening on
their RHEL5 ia64 systems because
  (1) the kernel consumes so much free memory as the page table cache
      like other caches, but there is not interface to show its amount
      unlike others. This page table cache is allocated by quicklist, and
      the amount of memory allocated by quicklist is not shown in
      /proc/meminfo.
  and
  (2) the size of this page table cache can grow slowly but steady up
      to the certain limit even without running any busy applications.

[How to fix]
This is fixed by the two patches.
PATCH(1)
  This patch reduces the size of cache to avoid OOM killer.
PATCH(2)
  This patch adds information into /proc/meminfo so that it can display
  the size of quicklist. This prevents end user from misunderstanding
  memory leak. Though PATCH(1) reduces page table cache, it can become
  huge. Therefore the function to show the size of cache is necessary.

As to PATCH(2), I have two alternative patches.
Alt 1) Add new entry "/proc/ptcache" instead of adding information into
       /proc/meminfo
Alt 2) Add new entry "/proc/sys/vm/page_table_cache" instead of adding
       information into /proc/meminfo

If you think it is not a good idea to add new information into
/proc/meminfo, I can post these alternative patches.

[Upstream]
This problem was discussed in these thread:
http://lkml.org/lkml/2008/8/20/127
http://lkml.org/lkml/2008/8/25/146

Andrew Morton recognized this problem to be serious, and he adopted
these patches in -mm tree. The following is the name of these patches
in -mm tree.

mm-size-of-quicklists-shouldnt-be-proportional-to-the-number-of-cpus.patch
mm-show-quicklist-usage-in-proc-meminfo.patch

[Test status]
Patch has been tested on the latest RHEL5.3 kernel
(kernel-2.6.18-105.el5) on x86/x86_64/ia64 machine, and there was
no problem.

Successful brew scratch build against each arch:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1441734

Please review and ACK.

Thanks,
Takao Indoh

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 73de1ed..7922778 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -63,13 +63,19 @@ static inline long
 max_pgt_pages(void)
 {
 	u64 node_free_pages, max_pgt_pages;
+	int node = numa_node_id();
+	int num_cpus_on_node;
+	cpumask_t cpumask_on_node = node_to_cpumask(node);
 
 #ifndef	CONFIG_NUMA
 	node_free_pages = nr_free_pages();
 #else
-	node_free_pages = nr_free_pages_pgdat(NODE_DATA(numa_node_id()));
+	node_free_pages = nr_free_pages_pgdat(NODE_DATA(node));
 #endif
 	max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
+
+	num_cpus_on_node = cpus_weight(cpumask_on_node);
+	max_pgt_pages /= num_cpus_on_node;
 	max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
 	return max_pgt_pages;
 }