Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1571

kernel-2.6.18-238.el5.src.rpm

From: AMEET M. PARANJAPE <aparanja@redhat.com>
Date: Fri, 15 May 2009 14:30:45 -0400
Subject: [infiniband] ib_core: use weak ordering for user memory
Message-id: 20090515182752.7067.47400.sendpatchset@squad5-lp1.lab.bos.redhat.com
O-Subject: [PATCH RHEL5.4 BZ501004] openib: ib_core: use weak ordering for user memory, again
Bugzilla: 501004
RH-Acked-by: David Howells <dhowells@redhat.com>
RH-Acked-by: Doug Ledford <dledford@redhat.com>

RHBZ#:
======
https://bugzilla.redhat.com/show_bug.cgi?id=501004

Description:
===========
Weak ordering for RDMA on OpenIB was included in RHEL5.3 and
OFED-1.4, in an effort from IBM and Mellanox to improve
Infiniband performance on certain IBM hardware (e.g. QS22).

Unfortunately, the feature got dropped during the inclusion
of OFED-1.4 in RHEL5.4, because core_undo_weak_ordering.patch
was accidentally applied there. It should not have been
applied because the patch was only meant for RHEL5.2 and
earlier, which did not yet have the dma_attr code.

Reverting core_undo_weak_ordering.patch again fixes the
regression against both RHLE5.3 and OFED-1.4 on later
kernels.

RHEL Version Found:
================
RHEL 5.4 beta

kABI Status:
============
No symbols were harmed.

Brew:
=====
Built on all platforms.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1801271

Upstream Status:
================
cb9fbc5c37b69ac584e61d449cfd590f5ae1f90d

Test Status:
============
A performance decrease is discovered on RHEL5.4 Beta expecially when executing
the IMB benchmark (formerly known as Pallas) on InfiniBand connections. The
performance decrease is up to 50%.

After applying this patch the performance degradation is no longer present.

===============================================================
Ameet Paranjape 978-392-3903 ext 23903
IBM on-site partner

Proposed Patch:
===============

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index bc0245c..a14f7a4 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -40,6 +40,10 @@
 
 #include "uverbs.h"
 
+static int allow_weak_ordering;
+module_param(allow_weak_ordering, bool, 0444);
+MODULE_PARM_DESC(allow_weak_ordering,  "Allow weak ordering for data registered memory");
+
 #define IB_UMEM_MAX_PAGE_CHUNK						\
 	((PAGE_SIZE - offsetof(struct ib_umem_chunk, page_list)) /	\
 	 ((void *) &((struct ib_umem_chunk *) 0)->page_list[1] -	\
@@ -101,8 +105,8 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 	int i;
 
 	list_for_each_entry_safe(chunk, tmp, &umem->chunk_list, list) {
-		ib_dma_unmap_sg(dev, chunk->page_list,
-				chunk->nents, DMA_BIDIRECTIONAL);
+		ib_dma_unmap_sg_attrs(dev, chunk->page_list,
+				      chunk->nents, DMA_BIDIRECTIONAL, &chunk->attrs);
 		for (i = 0; i < chunk->nents; ++i) {
 			struct page *page = sg_page(&chunk->page_list[i]);
 
@@ -141,6 +145,9 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
+	else if (allow_weak_ordering)
+		dma_set_attr(DMA_ATTR_WEAK_ORDERING, &attrs);
+
 
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);
@@ -219,6 +226,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 				goto out;
 			}
 
+			chunk->attrs = attrs;
 			chunk->nents = min_t(int, ret, IB_UMEM_MAX_PAGE_CHUNK);
 			sg_init_table(chunk->page_list, chunk->nents);
 			for (i = 0; i < chunk->nents; ++i) {
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 9ee0d2e..90f3712 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -36,6 +36,7 @@
 #include <linux/list.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <linux/dma-attrs.h>
 
 struct ib_ucontext;
 
@@ -56,6 +57,7 @@ struct ib_umem_chunk {
 	struct list_head	list;
 	int                     nents;
 	int                     nmap;
+	struct dma_attrs	attrs;
 	struct scatterlist      page_list[0];
 };