Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 3939

kernel-2.6.18-194.11.1.el5.src.rpm

From: Prarit Bhargava <prarit@redhat.com>
Date: Tue, 8 Dec 2009 18:41:56 -0500
Subject: [x86] fix stale data in shared_cpu_map cpumasks
Message-id: <20091208183854.6756.83274.sendpatchset@prarit.bos.redhat.com>
Patchwork-id: 21748
O-Subject: [RHEL5 PATCH]: Fix stale data in shared_cpu_map cpumasks
Bugzilla: 541953
RH-Acked-by: Don Dutile <ddutile@redhat.com>
RH-Acked-by: Matthew Garrett <mjg@redhat.com>
RH-Acked-by: Larry Woodman <lwoodman@redhat.com>

While executing random cpu hotplug events a panic was noticed by QA.

The panic is caused by an inconsistency in the shared_cpu_map cpumask.  The
patch below resolves the problem by

a) verifying a cpu is actually online before adding it to the shared_cpu_map
of a cpu,
b) only examining cpus that are part of the same lower level cache,
c) updating other siblings lower level cache maps when a cpu is added.
d) unrelated code cleanup of an extra whitespace that is no longer upstream.

There is some minor contention about this patch upstream.  The author of this
code wants to implement a fix such that the shared_cpu_map remains inconsistent
but the panic is avoided by a NULL pointer check in cache_remove_shared_cpu_map.

I have sent him a counter-proposal which is based on the patch below, so this
patch is not upstream.

This patch is required for the latest 5.4.z kernel and is blocking its release.

Resolves BZ 541953.

Tested for several hour long tests, limited by another panic that can be
reproduced on RHEL5.4 (-164.el5) -- see 544895.  Without this patch the
test fails usually within a few minutes.

Signed-off-by: Don Zickus <dzickus@redhat.com>

diff --git a/arch/i386/kernel/cpu/intel_cacheinfo.c b/arch/i386/kernel/cpu/intel_cacheinfo.c
index f2fe534..2a00fa4 100644
--- a/arch/i386/kernel/cpu/intel_cacheinfo.c
+++ b/arch/i386/kernel/cpu/intel_cacheinfo.c
@@ -468,15 +468,19 @@ static void __cpuinit cache_shared_cpu_map_setup(unsigned int cpu, int index)
 {
 	struct _cpuid4_info	*this_leaf, *sibling_leaf;
 	unsigned long num_threads_sharing;
-	int index_msb, i;
+	int index_msb, i, sibling;
 	struct cpuinfo_x86 *c = cpu_data;
 
 	if ((index == 3) && (c->x86_vendor == X86_VENDOR_AMD)) {
-		for_each_online_cpu(i) {
+		for_each_cpu_mask(i, c[cpu].llc_shared_map) {
 			if (cpuid4_info[i] == NULL)
 				continue;
 			this_leaf = CPUID4_INFO_IDX(i, index);
-			this_leaf->shared_cpu_map = c[i].llc_shared_map;
+			for_each_cpu_mask(sibling, c[cpu].llc_shared_map) {
+				if (!cpu_online(sibling))
+					continue;
+				cpu_set(sibling, this_leaf->shared_cpu_map);
+			}
 		}
 		return;
 	}
@@ -508,7 +512,7 @@ static void __cpuinit cache_remove_shared_cpu_map(unsigned int cpu, int index)
 
 	this_leaf = CPUID4_INFO_IDX(cpu, index);
 	for_each_cpu_mask(sibling, this_leaf->shared_cpu_map) {
-		sibling_leaf = CPUID4_INFO_IDX(sibling, index);	
+		sibling_leaf = CPUID4_INFO_IDX(sibling, index);
 		cpu_clear(cpu, sibling_leaf->shared_cpu_map);
 	}
 }