Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 2700

kernel-2.6.18-194.11.1.el5.src.rpm

From: Brad Peters <bpeters@redhat.com>
Date: Wed, 20 Aug 2008 15:12:56 -0400
Subject: [openib] ehca: local CA ACK delay has an invalid value
Message-id: 20080820191256.17528.64811.sendpatchset@squad5-lp1.lab.bos.redhat.com
O-Subject: [PATCH RHEL5.3 458378] IB/ehca:Local CA ACK Delay is set to a invalid value
Bugzilla: 458378
RH-Acked-by: Rik van Riel <riel@redhat.com>
RH-Acked-by: David Howells <dhowells@redhat.com>
RH-Acked-by: Doug Ledford <dledford@redhat.com>

RHBZ#:
======
https://bugzilla.redhat.com/show_bug.cgi?id=458378

Description:
===========
Bug fix / PPC only (as only PPC uses ehca)

Note: This patch depends on the two patches from RHBZ #443800, being rolled into
an OFED update by Doug Ledford

During cluster test we saw that some infiniband HW returns invalid value of 0 to
the device driver in the query_device() call for the Local CA ACK Delay.
This invalid value result in a wrong Ack Timeout value for RC QPs, because
applications will use the Local CA ACK Delay value to calculate the Timeout.

Due to the wrong Timeout value, a lot of RC connections will be dropped because
the adapter wait time for packet acknowledgement is to short. The possibillty of
hitting this issue is increased by the size of the infiniband cluster and the
workload which is running on these clusters.

This patch checks whether we get a invalid value for Local CA ACK delay and sets
a default minimum value.

RHEL Version Found:
================
RHEL 5.2

kABI Status:
============
Will test once Brew recovers

Brew:
=====
Unable to build since Brew is down

Upstream Status:
================
Posted and applied:
http://lkml.org/lkml/2008/7/21/128

Test Status:
============
Tested by Stefan Roscher <IBM> by setting up a cluster environment,
varying work load, and checking for connection drop.

===============================================================

Brad Peters 1-978-392-1000 x 23183
IBM on-site partner.

Proposed Patch:
===============
This patch is based on 2.6.18-104.el5

Some firmware versions report a Local CA ACK Delay of 0. In that case,
return a more sensible default value of 12 (-> 16 msec) instead.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>

diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index f860eb3..c04cbb1 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -102,8 +102,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 	}
 
 	props->max_pkeys       = 16;
-	props->local_ca_ack_delay
-		= rblock->local_ca_ack_delay;
+	/* Some FW versions say 0 here; insert sensible value in that case */
+	props->local_ca_ack_delay  = rblock->local_ca_ack_delay ?
+		min_t(u8, rblock->local_ca_ack_delay, 255) : 12;
 	props->max_raw_ipv6_qp
 		= min_t(unsigned, rblock->max_raw_ipv6_qp, INT_MAX);
 	props->max_raw_ethy_qp