Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 89877e42827f16fa5f86b1df0c2860b1 > files > 2581

kernel-2.6.18-128.1.10.el5.src.rpm

From: Aristeu Rozanski <aris@redhat.com>
Date: Tue, 23 Sep 2008 16:18:57 -0400
Subject: [x86_64] NMI wd: clear perf counter registers on P4
Message-id: 20080923201856.GU16840@redhat.com
O-Subject: [RHEL5.3 PATCH] NMI watchdog: clear performance counter registers on P4
Bugzilla: 461671
RH-Acked-by: Prarit Bhargava <prarit@redhat.com>
RH-Acked-by: Dave Anderson <anderson@redhat.com>

https://bugzilla.redhat.com/show_bug.cgi?id=461671

P4 processors have a quirk on performance counter registers that will keep
interrupting (in NMI watchdog and other cases, NMIs) until that bit is
cleared. When a kdump kernel boots, right after enabling the NMI delivery
for performance monitoring interrupts, if other performance counters were
in use on the previous kernel, they may have that bit set and will keep
generating NMIs forever since no code will handle other registers. This
didn't happen before the NMI watchdog work in 5.2 because both logical CPUs
used the same CCCR.

This problem causes a crash on Dave Anderson's box: the regular kernel will
have NMI watchdog working for both logical CPUs (using two different sets of
performance counters of the same core). When booting a the kdump kernel,
the second CCCR has the interrupt bit set and since by default kdump kernel
boot only using one CPU, the second logical CPU won't be used and thus
the second CCCR won't be initialized/reset. As soon the first logical
CPU is initialized and sets up the NMI watchdog, it enables the deliver of
PMIs using NMIs on local APIC. since the second CCCR is in the same CPU,
a NMI is triggered immediately causing a machine to crash due a unlikely
race in NMI watchdog code. To fix the problem, two patches were submitted,
one to clear the other performance counter registers on P4 when booting
with reset_devices and other to fix the race itself.

Both patches (this one and the one fixing the NMI watchdog) were submitted
upstream and accepted by Ingo for inclusion on 2.6.27 and 2.6.28 respectively.
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commit;h=28b166a700899a0f88b1cc283c449fb5bf72a635
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commit;h=b3e15bdef689641e7f1bb03efbe56112c3ee82e2

The second patch is not critical as this one and can wait for 5.4.

Tested on Dave's box and other P4 boxes with success.

diff --git a/arch/x86_64/kernel/perfctr-watchdog.c b/arch/x86_64/kernel/perfctr-watchdog.c
index 96eead0..d5a65ea 100644
--- a/arch/x86_64/kernel/perfctr-watchdog.c
+++ b/arch/x86_64/kernel/perfctr-watchdog.c
@@ -219,6 +219,27 @@ void enable_lapic_nmi_watchdog(void)
 	touch_nmi_watchdog();
 }
 
+#define P4_CONTROLS 18
+static unsigned int p4_controls[18] = {
+	MSR_P4_BPU_CCCR0,
+	MSR_P4_BPU_CCCR1,
+	MSR_P4_BPU_CCCR2,
+	MSR_P4_BPU_CCCR3,
+	MSR_P4_MS_CCCR0,
+	MSR_P4_MS_CCCR1,
+	MSR_P4_MS_CCCR2,
+	MSR_P4_MS_CCCR3,
+	MSR_P4_FLAME_CCCR0,
+	MSR_P4_FLAME_CCCR1,
+	MSR_P4_FLAME_CCCR2,
+	MSR_P4_FLAME_CCCR3,
+	MSR_P4_IQ_CCCR0,
+	MSR_P4_IQ_CCCR1,
+	MSR_P4_IQ_CCCR2,
+	MSR_P4_IQ_CCCR3,
+	MSR_P4_IQ_CCCR4,
+	MSR_P4_IQ_CCCR5,
+};
 /*
  * Activate the NMI watchdog via the local APIC.
  */
@@ -468,6 +489,26 @@ static int setup_p4_watchdog(unsigned nmi_hz)
 		evntsel_msr = MSR_P4_CRU_ESCR0;
 		cccr_msr = MSR_P4_IQ_CCCR0;
 		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+
+		/*
+		 * If we're on the kdump kernel or other situation, we may
+		 * still have other performance counter registers set to
+		 * interrupt and they'll keep interrupting forever because
+		 * of the P4_CCCR_OVF quirk. So we need to ACK all the
+		 * pending interrupts and disable all the registers here,
+		 * before reenabling the NMI delivery. Refer to p4_rearm()
+		 * about the P4_CCCR_OVF quirk.
+		 */
+		if (reset_devices) {
+			unsigned int low, high;
+			int i;
+
+			for (i = 0; i < P4_CONTROLS; i++) {
+				rdmsr(p4_controls[i], low, high);
+				low &= ~(P4_CCCR_ENABLE | P4_CCCR_OVF);
+				wrmsr(p4_controls[i], low, high);
+			}
+		}
 	} else {
 		/* logical cpu 1 */
 		perfctr_msr = MSR_P4_IQ_PERFCTR1;