Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1463

kernel-2.6.18-238.el5.src.rpm

Date: Tue, 3 Oct 2006 17:40:28 -0400
From: Greg Edwards <gedwards@redhat.com>
Subject: [RHEL5 RFC PATCH] exports for SGI XPMEM driver

We (SGI) have an add-on product we are interested in layering on top of
RHEL5.  One of the pieces is a highly optimized MPI library that
exploits some unique characteristics of our hardware.  This requires a
driver for cross-partition memory access, but we would need a few
symbols exported for this module.

The driver is GPL, but has not been pushed upstream yet.  Our XPMEM team
is planning on doing that in the next year.

The request is tracked in bugzilla:

Bug 206215: ProPack XPMEM support
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=206215

Note, we are only requesting the exports.  The fork() callout mentioned
in the above bugzilla is no longer needed.

One of our XPMEM developers wrote up the following description:


One of the significant performance and reliability benefits that SGI
ProPack has historically provided was a feature provided by XPMEM
utilized by the SGI MPI library (MPT) for MPI ranks to coordinate memory
access.

The reliability gains come from taking a large machine which, because
of its many parts, has a lowered MTBF.  By dividing that machine into
smaller parts, confining accesses to user space, and handling user space
errors, only the portion of the machine affected plus strictly the user
code sharing those portions will be lost.

The performance gains come when using this mode because no extra
hardware or software other than the memory controller itself is used to
arbitrate memory accesses from either local or remote hosts.

MPT relies very heavily upon a kernel module called XPMEM to achieve
this functionality.  It provides a mechanism analogous to System V shared
memory extended beyond the partitions physical boundary.  A process
on one partition creates a segment of memory for use and is given a
handle to that memory.  The handle is transfered to the remote host
which requests its local XPMEM attach the handle to his address space.
References to the attached segment get page table entries which point at
the memory on the owning partition and normal memory coherence protocols
keep everything in order.

The problems XPMEM encounters are constrained to working with page
tables, which the majority of the Linux kernel assumes have page table
entries that can be converted to a page frame and then a struct page.
For these remotely owned pages, this assumption is not true.  This
problem is similar to the execute-in-place feature of Xen.  It is
different in that the shared object is not owned by the kernel, but
rather a user page subject to changes by the hosting user application.


There is a copy of the xpmem driver here for perusing:

ftp://oss.sgi.com/projects/xpmem/xpmem.patch

The following exports are used by the XPMEM driver, and we are
interested in having them exported in the RHEL5 kernel.

The generic exports are:

	EXPORT_SYMBOL_GPL(tasklist_lock)
	EXPORT_SYMBOL_GPL(__put_task_struct)
	EXPORT_SYMBOL_GPL(schedule_on_each_cpu)

The ia64-specific exports are:

	EXPORT_SYMBOL_GPL(ia64_boot_param)
	EXPORT_SYMBOL_GPL(node_to_cpu_mask)
	EXPORT_SYMBOL_GPL(pio_phys_read_mmr)
	EXPORT_SYMBOL_GPL(pio_phys_write_mmr)
	EXPORT_SYMBOL_GPL(pio_atomic_phys_write_mmrs)

---
 arch/ia64/kernel/ia64_ksyms.c |    5 +++++
 arch/ia64/kernel/numa.c       |    1 +
 arch/ia64/kernel/setup.c      |    2 ++
 kernel/fork.c                 |    2 ++
 kernel/workqueue.c            |    1 +
 5 files changed, 11 insertions(+)

Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c	2006-10-03 15:30:57.635422182 -0500
+++ linux/kernel/fork.c	2006-10-03 15:31:17.017862573 -0500
@@ -64,6 +64,7 @@ int max_threads;		/* tunable limit on nr
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
+EXPORT_SYMBOL_GPL(tasklist_lock);
 
 int nr_processes(void)
 {
@@ -122,6 +123,7 @@ void __put_task_struct(struct task_struc
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
 }
+EXPORT_SYMBOL_GPL(__put_task_struct);
 
 void __init fork_init(unsigned long mempages)
 {
Index: linux/arch/ia64/kernel/setup.c
===================================================================
--- linux.orig/arch/ia64/kernel/setup.c	2006-10-03 15:30:41.989452026 -0500
+++ linux/arch/ia64/kernel/setup.c	2006-10-03 15:31:17.021863077 -0500
@@ -99,6 +99,8 @@ DEFINE_PER_CPU(unsigned long, local_per_
 DEFINE_PER_CPU(unsigned long, ia64_phys_stacked_size_p8);
 unsigned long ia64_cycles_per_usec;
 struct ia64_boot_param *ia64_boot_param;
+EXPORT_SYMBOL_GPL(ia64_boot_param);
+
 struct screen_info screen_info;
 unsigned long vga_console_iobase;
 unsigned long vga_console_membase;
Index: linux/arch/ia64/kernel/numa.c
===================================================================
--- linux.orig/arch/ia64/kernel/numa.c	2006-09-19 22:42:06.000000000 -0500
+++ linux/arch/ia64/kernel/numa.c	2006-10-03 15:31:17.021863077 -0500
@@ -28,6 +28,7 @@ u16 cpu_to_node_map[NR_CPUS] __cacheline
 EXPORT_SYMBOL(cpu_to_node_map);
 
 cpumask_t node_to_cpu_mask[MAX_NUMNODES] __cacheline_aligned;
+EXPORT_SYMBOL_GPL(node_to_cpu_mask);
 
 /**
  * build_cpu_to_node_map - setup cpu to node and node to cpumask arrays
Index: linux/arch/ia64/kernel/ia64_ksyms.c
===================================================================
--- linux.orig/arch/ia64/kernel/ia64_ksyms.c	2006-10-03 15:30:43.513643962 -0500
+++ linux/arch/ia64/kernel/ia64_ksyms.c	2006-10-03 15:31:17.021863077 -0500
@@ -116,3 +116,10 @@ EXPORT_SYMBOL(ia64_spinlock_contention);
 
 extern char ia64_ivt[];
 EXPORT_SYMBOL(ia64_ivt);
+
+#if defined(CONFIG_IA64_GENERIC) || defined(CONFIG_IA64_SGI_SN2)
+#include <asm/sn/rw_mmr.h>
+EXPORT_SYMBOL_GPL(pio_phys_read_mmr);
+EXPORT_SYMBOL_GPL(pio_phys_write_mmr);
+EXPORT_SYMBOL_GPL(pio_atomic_phys_write_mmrs);
+#endif
Index: linux/kernel/workqueue.c
===================================================================
--- linux.orig/kernel/workqueue.c	2006-09-19 22:42:06.000000000 -0500
+++ linux/kernel/workqueue.c	2006-10-03 15:31:17.025863581 -0500
@@ -521,6 +521,7 @@ int schedule_on_each_cpu(void (*func)(vo
 	free_percpu(works);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(schedule_on_each_cpu);
 
 void flush_scheduled_work(void)
 {