Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 2161

kernel-2.6.18-238.el5.src.rpm

From: Larry Woodman <lwoodman@redhat.com>
Date: Tue, 21 Sep 2010 16:55:00 -0400
Subject: [mm] add dirty_background_bytes and dirty_bytes sysctls
Message-id: <1285088100.31554.26.camel@dhcp-100-19-198.bos.redhat.com>
Patchwork-id: 28329
O-Subject: [RHEL5 Patch] Backport dirty_background_bytes and dirty_bytes
	sysctls to RHEL 5
Bugzilla: 635782
RH-Acked-by: Rik van Riel <riel@redhat.com>

On really large systems with limited IO we need the patches to control
background writeout in terms of bytes rather than percentages of memory.

BZ 635782 says it all:

----------------------------------------------------------------------------------------
Description of problem:

With the increasing number of systems with 50 or more GiBs of RAM, dirty_ratio
with a lower bound of 5% is not that helpful. For example, the affects of Bug
469848 (nfs_getattr() hangs during heavy write workloads) could be limited if
we were able to limit the number of dirty pages to a much lower level on
systems with a lot of RAM.

Patches:

The commits that need to be backported from mainline:
-----------------------------------------------------------
commit 2da02997e08d3efe8174c7a47696e6f7cbe69ba9
Author: David Rientjes <rientjes@google.com>
Date:   Tue Jan 6 14:39:31 2009 -0800

    mm: add dirty_background_bytes and dirty_bytes sysctls

    This change introduces two new sysctls to /proc/sys/vm:
    dirty_background_bytes and dirty_bytes.

    dirty_background_bytes is the counterpart to dirty_background_ratio and
    dirty_bytes is the counterpart to dirty_ratio.

    With growing memory capacities of individual machines, it's no longer
    sufficient to specify dirty thresholds as a percentage of the amount of
    dirtyable memory over the entire system.

    dirty_background_bytes and dirty_bytes specify quantities of memory, in
    bytes, that represent the dirty limits for the entire system.  If either
    of these values is set, its value represents the amount of dirty memory
    that is needed to commence either background or direct writeback.

    When a `bytes' or `ratio' file is written, its counterpart becomes a
    function of the written value.  For example, if dirty_bytes is written to
    be 8096, 8K of memory is required to commence direct writeback.
    dirty_ratio is then functionally equivalent to 8K / the amount of
    dirtyable memory:

     dirtyable_memory = free pages + mapped pages + file cache

     dirty_background_bytes = dirty_background_ratio * dirtyable_memory
      -or-
     dirty_background_ratio = dirty_background_bytes / dirtyable_memory

      AND

     dirty_bytes = dirty_ratio * dirtyable_memory
      -or-
     dirty_ratio = dirty_bytes / dirtyable_memory

    Only one of dirty_background_bytes and dirty_background_ratio may be
    specified at a time, and only one of dirty_bytes and dirty_ratio may be
    specified.  When one sysctl is written, the other appears as 0 when read.

    The `bytes' files operate on a page size granularity since dirty limits
    are compared with ZVC values, which are in page units.

    Prior to this change, the minimum dirty_ratio was 5 as implemented by
    get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
    written value between 0 and 100.  This restriction is maintained, but
    dirty_bytes has a lower limit of only one page.

    Also prior to this change, the dirty_background_ratio could not equal or
    exceed dirty_ratio.  This restriction is maintained in addition to
    restricting dirty_background_bytes.  If either background threshold equals
    or exceeds that of the dirty threshold, it is implicitly set to half the
    dirty threshold.

    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Signed-off-by: David Rientjes <rientjes@google.com>
    Cc: Andrea Righi <righi.andrea@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit fc3501d411d34823fb9be248a95a0c44f945866f
Author: Sven Wegener <sven.wegener@stealer.net>
Date:   Wed Feb 11 13:04:23 2009 -0800

    mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches

commit 9e4a5bda89034502fb144331e71a0efdfd5fae97
Author: Andrea Righi <righi.andrea@gmail.com>
Date:   Thu Apr 30 15:08:57 2009 -0700

    mm: prevent divide error for small values of vm_dirty_bytes
-----------------------------------------------------------------------------

The attached patch fixes the problem and BZ635782

Signed-off-by: Jarod Wilson <jarod@redhat.com>

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 5b302a0..bb53eb8 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -216,6 +216,8 @@ enum
 	VM_TOPDOWN_ALLOCATE_FAST=42, /* optimize speed over fragmentation in topdown alloc */
 	VM_MAX_RECLAIMS=43,     /* max reclaims allowed */
 	VM_DEVZERO_OPTIMIZED=44, /* pagetables initialized with ZERO_PAGE at mmmap time */
+	VM_DIRTY_BYTES=45, 	/* specific number of dirty bytes allowed */
+	VM_DIRTY_BACKGND_BYTES=46, /* specific number of dirty background bytes allowed */
 };
 
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index aa705cf..8579372 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -84,6 +84,8 @@ extern int flush_mmap_pages;
 extern int max_writeback_pages;
 extern int blk_iopoll_enabled;
 extern int vm_devzero_optimized;
+extern int vm_dirty_bytes;
+extern int dirty_background_bytes;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
@@ -1253,6 +1255,26 @@ static ctl_table vm_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &zero,
 	},
+	{
+		.ctl_name	= VM_DIRTY_BYTES,
+		.procname	= "vm_dirty_bytes",
+		.data		= &vm_dirty_bytes,
+		.maxlen		= sizeof(vm_dirty_bytes),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
+	{
+		.ctl_name	= VM_DIRTY_BACKGND_BYTES,
+		.procname	= "dirty_background_bytes",
+		.data		= &dirty_background_bytes,
+		.maxlen		= sizeof(dirty_background_bytes),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
 	{ .ctl_name = 0 }
 };
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d337e45..1c1c2dd 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -98,6 +98,9 @@ int laptop_mode;
 
 EXPORT_SYMBOL(laptop_mode);
 
+int vm_dirty_bytes = 0;
+int dirty_background_bytes = 0;
+
 /* End of sysctl-exported parameters */
 
 
@@ -146,21 +149,30 @@ get_dirty_limits(long *pbackground, long *pdirty,
 				global_page_state(NR_ANON_PAGES)) * 100) /
 					total_pages;
 
-	dirty_ratio = vm_dirty_ratio;
+	if (vm_dirty_bytes)
+		dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
+	else {
+		dirty_ratio = vm_dirty_ratio;
+
+		/* if vm_dirty_ratio is 100 dont limit to 1/2 unmapped_ratio */
+		if ((dirty_ratio > unmapped_ratio / 2) && (dirty_ratio != 100))
+			dirty_ratio = unmapped_ratio / 2;
 
-	/* if vm_dirty_ratio is 100 dont limit to 1/2 unmapped_ratio */
-	if ((dirty_ratio > unmapped_ratio / 2) && (dirty_ratio != 100))
-		dirty_ratio = unmapped_ratio / 2;
+		if (dirty_ratio < 5)
+			dirty_ratio = 5;
 
-	if (dirty_ratio < 5)
-		dirty_ratio = 5;
+		dirty = (dirty_ratio * available_memory) / 100;
+	}
 
-	background_ratio = dirty_background_ratio;
-	if (background_ratio >= dirty_ratio)
-		background_ratio = dirty_ratio / 2;
+	if (dirty_background_bytes)
+		background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
+	else {
+		background_ratio = dirty_background_ratio;
+		if (background_ratio >= dirty_ratio)
+			background_ratio = dirty_ratio / 2;
 
-	background = (background_ratio * available_memory) / 100;
-	dirty = (dirty_ratio * available_memory) / 100;
+		background = (background_ratio * available_memory) / 100;
+	}
 	tsk = current;
 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
 		background += background / 4;