From: AMEET M. PARANJAPE <aparanja@redhat.com> Date: Fri, 15 May 2009 14:30:45 -0400 Subject: [infiniband] ib_core: use weak ordering for user memory Message-id: 20090515182752.7067.47400.sendpatchset@squad5-lp1.lab.bos.redhat.com O-Subject: [PATCH RHEL5.4 BZ501004] openib: ib_core: use weak ordering for user memory, again Bugzilla: 501004 RH-Acked-by: David Howells <dhowells@redhat.com> RH-Acked-by: Doug Ledford <dledford@redhat.com> RHBZ#: ====== https://bugzilla.redhat.com/show_bug.cgi?id=501004 Description: =========== Weak ordering for RDMA on OpenIB was included in RHEL5.3 and OFED-1.4, in an effort from IBM and Mellanox to improve Infiniband performance on certain IBM hardware (e.g. QS22). Unfortunately, the feature got dropped during the inclusion of OFED-1.4 in RHEL5.4, because core_undo_weak_ordering.patch was accidentally applied there. It should not have been applied because the patch was only meant for RHEL5.2 and earlier, which did not yet have the dma_attr code. Reverting core_undo_weak_ordering.patch again fixes the regression against both RHLE5.3 and OFED-1.4 on later kernels. RHEL Version Found: ================ RHEL 5.4 beta kABI Status: ============ No symbols were harmed. Brew: ===== Built on all platforms. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1801271 Upstream Status: ================ cb9fbc5c37b69ac584e61d449cfd590f5ae1f90d Test Status: ============ A performance decrease is discovered on RHEL5.4 Beta expecially when executing the IMB benchmark (formerly known as Pallas) on InfiniBand connections. The performance decrease is up to 50%. After applying this patch the performance degradation is no longer present. =============================================================== Ameet Paranjape 978-392-3903 ext 23903 IBM on-site partner Proposed Patch: =============== diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index bc0245c..a14f7a4 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -40,6 +40,10 @@ #include "uverbs.h" +static int allow_weak_ordering; +module_param(allow_weak_ordering, bool, 0444); +MODULE_PARM_DESC(allow_weak_ordering, "Allow weak ordering for data registered memory"); + #define IB_UMEM_MAX_PAGE_CHUNK \ ((PAGE_SIZE - offsetof(struct ib_umem_chunk, page_list)) / \ ((void *) &((struct ib_umem_chunk *) 0)->page_list[1] - \ @@ -101,8 +105,8 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d int i; list_for_each_entry_safe(chunk, tmp, &umem->chunk_list, list) { - ib_dma_unmap_sg(dev, chunk->page_list, - chunk->nents, DMA_BIDIRECTIONAL); + ib_dma_unmap_sg_attrs(dev, chunk->page_list, + chunk->nents, DMA_BIDIRECTIONAL, &chunk->attrs); for (i = 0; i < chunk->nents; ++i) { struct page *page = sg_page(&chunk->page_list[i]); @@ -141,6 +145,9 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (dmasync) dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs); + else if (allow_weak_ordering) + dma_set_attr(DMA_ATTR_WEAK_ORDERING, &attrs); + if (!can_do_mlock()) return ERR_PTR(-EPERM); @@ -219,6 +226,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, goto out; } + chunk->attrs = attrs; chunk->nents = min_t(int, ret, IB_UMEM_MAX_PAGE_CHUNK); sg_init_table(chunk->page_list, chunk->nents); for (i = 0; i < chunk->nents; ++i) { diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 9ee0d2e..90f3712 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -36,6 +36,7 @@ #include <linux/list.h> #include <linux/scatterlist.h> #include <linux/workqueue.h> +#include <linux/dma-attrs.h> struct ib_ucontext; @@ -56,6 +57,7 @@ struct ib_umem_chunk { struct list_head list; int nents; int nmap; + struct dma_attrs attrs; struct scatterlist page_list[0]; };