From: Larry Woodman <lwoodman@redhat.com> Date: Tue, 21 Apr 2009 17:05:44 -0400 Subject: [mm] tweak vm diry_ratio to prevent stalls on some DBs Message-id: 1240347944.11613.20.camel@dhcp-100-19-198.bos.redhat.com O-Subject: [RHEL5-U4 patch] repost - Provide reasonable work-around for DB2 server becomes unresponsive when performing backups Bugzilla: 295291 RH-Acked-by: Rik van Riel <riel@redhat.com> RH-Acked-by: Jeff Moyer <jmoyer@redhat.com> We have a relatively old BZ(295291) that complains about systems running DB2 servers becoming unresponsive when backups are performance. After investigation we realized the problem was deeper in that the backups were being performed over NFS and the NFS server would periodically stall. When this happens the amount of dirty memory in the pagecache climbs over /proc/sys/vm/dirty_ratio which by default is 40% of memory. Practically all of the dirty memory is from NFS mounted files and the clean-up is stalled because the NFS server stalled. When a local file system write occurs we end up calling balance_dirty_pages() because more than dirty_ratio of the memory is dirty. balance_dirty_pages() calls get_dirty_limits() to get the dirty_ratio then forces the process to writeback_inodes() and block because we are over the dirty_ratio even though it was a local file system write and the cause of the dirty memory is NFS. The real fix for this problem is upstream, per-device dirty limits. However this is a big change that is way to complicated and kABI breaking for RHEL5. However, we could workaround this problem by increasing the dirty_ratio so that the dirty memory does not go above the dirty_ratio when the NFS stalls occur. The problem is that get_dirty_limits will not obey /proc/sys/vm/dirty_ratio if the dirty memory exceeds 1/2 of the unmapped memory. In other words /proc/sys/vm/dirty_ratio does not work, its capped at 1/2 of the unmapped memory(pagecache memory thats not mapped). This has undesirable side effects of some system that run applications that map lots of memory, like DB2. Since changing that behavior for all systems would be far too risky we can prevent the system from capping /proc/sys/vm/dirty_ratio if it has been tuned to 100. The attached patch allows one to tune dirty_ratio and have it work as desired without effecting system that have not been tuned. Fixes BZ295291 diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 4fa0f54..4edf8e9 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -145,7 +145,9 @@ get_dirty_limits(long *pbackground, long *pdirty, total_pages; dirty_ratio = vm_dirty_ratio; - if (dirty_ratio > unmapped_ratio / 2) + + /* if vm_dirty_ratio is 100 dont limit to 1/2 unmapped_ratio */ + if ((dirty_ratio > unmapped_ratio / 2) && (dirty_ratio != 100)) dirty_ratio = unmapped_ratio / 2; if (dirty_ratio < 5)