Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 2829

kernel-2.6.18-194.11.1.el5.src.rpm

From: Don Dutile <ddutile@redhat.com>
Date: Fri, 29 Jan 2010 16:42:11 -0500
Subject: [pci] VF can't be enabled in dom0
Message-id: <4B630FE3.708@redhat.com>
Patchwork-id: 23006
O-Subject: [PATCH RHEL5.5 v2]  VF can't be enabled in dom0
Bugzilla: 547980
RH-Acked-by: Chris Wright <chrisw@redhat.com>
RH-Acked-by: Andrew Jones <drjones@redhat.com>
RH-Acked-by: Christopher Lalancette <clalance@redhat.com>

BZ 547980

V2 update: Put all code under ifdef CONFIG_XEN, and
           reduced kernel boot parm from
           on/enable/1,off/disable/0 to simply on/off,
           per feedback from Chris Wright.
         : re-ran tests on the (rhts) hp z800;
           did not rerun test at Intel since simple shuffle of code.

Problem:
========
On some systems, the BIOS and associated e820 tables only
map PCI mmconf (memory-mapped config space) for the first 128 busses
on a PCI segment, and not the full architecturally capable 256 busses.
PCI mmconf requires 1MB of mapped space per bus.
PCI mmconf is used to access PCI config space above 256 bytes for each
PCI device (up to 4K per device).

RHEL5 hard codes it's mmap'd mmconf area for 256 busses per segment.
This works on bare metal kernels since the additional mmap'd space
is not accessed/used.
Upstream predominantly uses the max PCI bus number from the ACPI
table, but has a bunch of if-and-sanity_checks-etc., that seemed
all too ugly to backport.

On a -xen dom0 kernel, this causes a failure in the kernel if
PCI extended configuration space is needed, which it is for
PCI device assignment (aka, xen pci passthrough). Since the
xen hypervisor checks whether the dom0 should be allowed access
to this extended space, the hv fails the mapping when it exceeds
the space provided in the e820 map, and the dom0 kernel
reverts back to only supporting non-mmconf space, which limits PCI
config space access to the lower 256 bytes of each PCI device.

For VF functionality (and PCI device assignment, aka, pci passthrough)
to be supported on xen dom0, mmconf space must be mapped, since
the extended PCI configuration space above 256 is used for critical
data structure-based information.

The fix:
========
Adhere to mapping the amount of PCI mmconf space as stated in the
ACPI tables per the max PCI bus number (aperture) in the table.

Since this hasn't been a problem on bare metal for all of RHEL5,
and isn't a problem for KVM, and is restricted to xen-based dom0's,
the patch is restricted to only adhere to the ACPI table when
it's a -xen/dom0 kernel, and the pci_pt_access_e820 kernel param is set to
1/on, which it must be to enable the dom0 to request mapping
of the PCI mmconf space for VF device mapping.

Upstream predominantly uses the max PCI bus number from the ACPI
table, but has a bunch of if-and-sanity_checks-etc., that seemed
all too ugly to backport.  The above modification seemed simpler &
avoided regressions on RHEL5 systems.

Note: originally, I had based the patch to do a dmi decode & check
      to only implement the workaround on the one system it was reported on
      at the time.  Then another BZ reported another xen pci passthrough
      problem, which required this fix as well as another, to get passthrough to work.
      Thus, it was deemed that this may continue to occur on more
      RHEL5 -xen/dom0 systems in the future that support VTd/device-assignment/SRIOV.

In case this patch causes problem due to another ACPI table bug,
a new kernel cmdline argument, acpi_mcfg_max_pci_bus_num,
was created to defeat this workaround (by setting the param to 'off')
and force the use of the original RHEL5 code to map all 256
bus's PCI mmconf space.  Call me skiddish about BIOS tables....  ;-)

Testing
=======
(1) Tested on original system it was reported on -- HP z800.
(2) Tested by a system at Intel that also had similar problem.

Tested default code path, and tested new cmdline argument to force a
PCI mmconf mapping failure by forcing use of max bus aper of 256, as
the code originally did.

Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=2232609

Please review and ACK.

- Don
>From 7a3738760c5cea3c77b3f26acba8226f82a0c608 Mon Sep 17 00:00:00 2001
From: Donald Dutile <ddutile@redhat.com>
Date: Tue, 5 Jan 2010 15:06:03 -0500
Subject: [PATCH] PCI-MMCONFIG: workaround for xen-dom0 acpi-max-pci-bus-num


diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index b36374d..6098bfb 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -134,6 +134,66 @@ static struct pci_raw_ops pci_mmcfg = {
 	.write =	pci_mmcfg_write,
 };
 
+#ifdef CONFIG_XEN
+/* 
+ * 1=default for xen kernel,
+ * 0=force use of MMCONFIG_APER_MAX
+ */
+static int use_acpi_mcfg_max_pci_bus_num = 1;
+
+/*
+ * on  == use acpi table value
+ * off == use max PCI bus num value
+ */
+int __init acpi_mcfg_max_pci_bus_num_setup(char *str)
+{
+	/* force use of acpi value for max pci bus num */
+	if (!strncmp(str, "on", 2))
+		use_acpi_mcfg_max_pci_bus_num = 1;
+	/* force use of MMCONFIG_APER_MAX */
+	if (!strncmp(str, "off", 3))
+		use_acpi_mcfg_max_pci_bus_num = 0;
+
+	return 1;
+}
+
+__setup("acpi_mcfg_max_pci_bus_num=", acpi_mcfg_max_pci_bus_num_setup);
+#endif
+
+/* 
+ * RHEL5 doesn't trust acpi for max pci bus num in acpi table;
+ * but could map past/over valid PCI mmconf space if blindly
+ * use MMCONFIG_APER_MAX; e.g., xen dom0's may fail.
+ * so check if system requires acpi table value,
+ * or sysadmin has forced use of MMCONFIG_APER_MAX on kernel cmd line
+ */
+static unsigned long get_mmcfg_aper(struct acpi_table_mcfg_config *cfg)
+{
+	unsigned long mmcfg_aper = MMCONFIG_APER_MAX;
+
+/* xen kernel && pci pass-through only */
+#ifdef CONFIG_XEN
+	extern int pci_pt_e820_access_enabled;
+
+	if (use_acpi_mcfg_max_pci_bus_num && pci_pt_e820_access_enabled) {
+		/* trust acpi values for end & start bus number */
+		mmcfg_aper = 
+			cfg->end_bus_number - cfg->start_bus_number + 1;
+		printk(KERN_INFO
+		       "PCI: Using acpi max pci bus value of 0x%lx \n",
+			mmcfg_aper);
+		/* 32 slots, 8 fcns/slot, 4096 pci-cfg bytes/fcn */
+		mmcfg_aper *= 32 * 8 * 4096;
+		if (mmcfg_aper < MMCONFIG_APER_MIN) 
+			mmcfg_aper = MMCONFIG_APER_MIN;
+		if (mmcfg_aper > MMCONFIG_APER_MAX)
+			mmcfg_aper = MMCONFIG_APER_MAX;
+	}
+#endif
+
+	return mmcfg_aper;
+}
+
 void __init pci_mmcfg_init(void)
 {
 	int i;
@@ -165,9 +225,14 @@ void __init pci_mmcfg_init(void)
 		return;
 	}
 	for (i = 0; i < pci_mmcfg_config_num; ++i) {
-		pci_mmcfg_virt[i].cfg = &pci_mmcfg_config[i];
+		struct acpi_table_mcfg_config *cfg = &pci_mmcfg_config[i];
+		unsigned long mmcfg_aper;
+
+		mmcfg_aper = get_mmcfg_aper(cfg);
+
+		pci_mmcfg_virt[i].cfg = cfg;
 		pci_mmcfg_virt[i].virt = ioremap_nocache(pci_mmcfg_config[i].base_address,
-							 MMCONFIG_APER_MAX);
+							 mmcfg_aper);
 		if (!pci_mmcfg_virt[i].virt) {
 			printk("PCI: Cannot map mmconfig aperture for segment %d\n",
 			       pci_mmcfg_config[i].pci_segment_group_number);