From: Takao Indoh <tindoh@redhat.com> Date: Thu, 28 Aug 2008 12:37:50 -0400 Subject: [ia64] procfs: reduce the size of page table cache Message-id: 20080828163112.10505.31459.sendpatchset@dhcp-100-3-208.bos.redhat.com O-Subject: [RHEL5.3 PATCH 0/2] show the size of page table cache in procfs Bugzilla: 458410 This patch fixes bz458410. https://bugzilla.redhat.com/show_bug.cgi?id=458410 [Background] IA64 kernel consumes so much memory for per-cpu page table cache that system free memory is reduced by considerable size. In the worst case, about 50% of free memory can be used by the page table cache on a 16-CPU node system. It can cause OOM killer. Customers would wrongly suppose a kernel memory-leak is happening on their RHEL5 ia64 systems because (1) the kernel consumes so much free memory as the page table cache like other caches, but there is not interface to show its amount unlike others. This page table cache is allocated by quicklist, and the amount of memory allocated by quicklist is not shown in /proc/meminfo. and (2) the size of this page table cache can grow slowly but steady up to the certain limit even without running any busy applications. [How to fix] This is fixed by the two patches. PATCH(1) This patch reduces the size of cache to avoid OOM killer. PATCH(2) This patch adds information into /proc/meminfo so that it can display the size of quicklist. This prevents end user from misunderstanding memory leak. Though PATCH(1) reduces page table cache, it can become huge. Therefore the function to show the size of cache is necessary. As to PATCH(2), I have two alternative patches. Alt 1) Add new entry "/proc/ptcache" instead of adding information into /proc/meminfo Alt 2) Add new entry "/proc/sys/vm/page_table_cache" instead of adding information into /proc/meminfo If you think it is not a good idea to add new information into /proc/meminfo, I can post these alternative patches. [Upstream] This problem was discussed in these thread: http://lkml.org/lkml/2008/8/20/127 http://lkml.org/lkml/2008/8/25/146 Andrew Morton recognized this problem to be serious, and he adopted these patches in -mm tree. The following is the name of these patches in -mm tree. mm-size-of-quicklists-shouldnt-be-proportional-to-the-number-of-cpus.patch mm-show-quicklist-usage-in-proc-meminfo.patch [Test status] Patch has been tested on the latest RHEL5.3 kernel (kernel-2.6.18-105.el5) on x86/x86_64/ia64 machine, and there was no problem. Successful brew scratch build against each arch: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1441734 Please review and ACK. Thanks, Takao Indoh diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 73de1ed..7922778 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -63,13 +63,19 @@ static inline long max_pgt_pages(void) { u64 node_free_pages, max_pgt_pages; + int node = numa_node_id(); + int num_cpus_on_node; + cpumask_t cpumask_on_node = node_to_cpumask(node); #ifndef CONFIG_NUMA node_free_pages = nr_free_pages(); #else - node_free_pages = nr_free_pages_pgdat(NODE_DATA(numa_node_id())); + node_free_pages = nr_free_pages_pgdat(NODE_DATA(node)); #endif max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM; + + num_cpus_on_node = cpus_weight(cpumask_on_node); + max_pgt_pages /= num_cpus_on_node; max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES); return max_pgt_pages; }