Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 2706

kernel-2.6.18-194.11.1.el5.src.rpm

From: Doug Ledford <dledford@redhat.com>
Subject: [Patch RHEL5.1] Fix ipath driver when 2 ipath controllers are on 	the same subnet
Date: Thu, 16 Aug 2007 16:37:58 -0400
Bugzilla: 253005
Message-Id: <1187296678.14384.215.camel@firewall.xsintricity.com>
Changelog: [openib] Fix two ipath controllers on same subnet


There is a bug in the ipath driver when you have two cards in a system
that are connected to the same subnet.  Each card has a unique guid, and
the driver is supposed to send that guid to the subnet manager so that
the subnet manager can build a map of what guids exist and assign each
guid/port combination a unique link id.  Due to a thinko in the ipath
driver, all cards in the system report the same guid to the subnet
manager.  This causes the subnet manager to think that it is merely
receiving duplicate information about the same guid/port combination,
even though in reality it is receiving information about two distinctly
different cards.  The net result is that the subnet manager assigns the
same link id to both cards in the system.  When the cards then attempt
to attach themselves to the ib fabric with the same link id, the last
card to attempt to attach wins out and the other card gets disabled by
the switch.  However, the cards will periodly attempt to reattach using
that link id, so one card will be active for a while, then the other
card will attempt to attach with the same link id, it will win, and the
card that *was* active goes inactive.  This repeats ad infinitum.

The attached patch solves this issue by correctly using the guid from
the card in question when returning the guid/port information to the
subnet manager.

I verified that prior to this patch, a fresh start of the opensm subnet
manager with the previous guid cache erased did not in fact see both
guids from the machine with two cards installed, and instead only saw
one guid.  I then booted the problem machine with a kernel that had this
patch applied, and both ports on the card were properly assigned
different link ids, both ports were able to attach and stay attached to
the fabric, and a reinspection of the guid cache on the subnet manager
machine now properly shows both card guids in the guid list.

This is for bugzilla 253005 and I've requested exception status for this
one line change.  It's also already upstream.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

commit f41d229865c984015914221959675b1c8723f6a7
Author: Sean Hefty <sean.hefty@intel.com>

    IB/ipath: return correct PortGUID in NodeInfo
    
    Return the PortGUID of the correct port when responding to a NodeInfo
    query.  Returning the SystemImageGUID causes issues when there are
    multiple HCAs in a single system.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index 2aaa029..d61c030 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -103,7 +103,7 @@ static int recv_subn_get_nodeinfo(struct ib_smp *smp,
 	/* This is already in network order */
 	nip->sys_guid = to_idev(ibdev)->sys_image_guid;
 	nip->node_guid = dd->ipath_guid;
-	nip->port_guid = nip->sys_guid;
+	nip->port_guid = dd->ipath_guid;
 	nip->partition_cap = cpu_to_be16(ipath_get_npkeys(dd));
 	nip->device_id = cpu_to_be16(dd->ipath_deviceid);
 	majrev = dd->ipath_majrev;