Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 3573

kernel-2.6.18-194.11.1.el5.src.rpm

From: Marcus Barrow <mbarrow@redhat.com>
Date: Wed, 8 Jul 2009 22:43:18 -0400
Subject: [scsi] qla2xxx: prevent hangs in extended error handling
Message-id: 20090709024318.14185.48424.sendpatchset@file.bos.redhat.com
O-Subject: [rhel 5.4 patch] [V3] qla2xxx - prevent hangs in extended error handling (EEH ).
Bugzilla: 470510
RH-Acked-by: Prarit Bhargava <prarit@redhat.com>
RH-Acked-by: Pete Zaitcev <zaitcev@redhat.com>

BZ 470510

Version 3 of this patch contains two further formatting fixes. A blank
line has been removed and the indenting of a return statement was fixed.
These did in fact look pretty goofy...

Version 2 of this patch includes a missing space and tab character,
as required by Prarit. It has not been re-tested, but should be OK.

This patch contains some more fixups for the EEH code in qla2xxx.
EEH provides extended error handling for PCI devices. It is used
in IBM PPC blade servers. With out these fixes there can be
infinite loops.

IBM and QLogic have already or are in the process of submitting
this code upstream.

Ths code has been developed and tested by QLogic and IBM. It applies
and builds cleanly with -154.

Check offline status in qla24xx_reset_chip as well.
In mailbox commands do not process if device is already
marked offline. The wait for previous mbx completion should
be done un-conditionally

diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
index 3e3b204..ea2a73d 100644
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -1849,11 +1849,13 @@ qla2x00_fabric_param_show(struct class_device *cdev, char *buf)
 static ssize_t
 qla2x00_fw_state_show(struct class_device *cdev, char *buf)
 {
-	int rval;
+	int rval = QLA_FUNCTION_FAILED;
 	uint16_t state[5];
 	scsi_qla_host_t *ha = to_qla_host(class_to_shost(cdev));
 	scsi_qla_host_t *pha = to_qla_parent(ha);
-	rval = qla2x00_get_firmware_state(pha, state);
+	if (!ha->flags.eeh_busy)
+		rval = qla2x00_get_firmware_state(pha, state);
+
 	if (rval != QLA_SUCCESS)
 	    memset(state, -1, sizeof(state));
 	return snprintf(buf, PAGE_SIZE, "0x%x 0x%x 0x%x 0x%x 0x%x\n", state[0],
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index d6bfe14..2feb01d 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -634,6 +634,11 @@ qla24xx_reset_risc(scsi_qla_host_t *ha)
 void
 qla24xx_reset_chip(scsi_qla_host_t *ha)
 {
+	if (pci_channel_offline(ha->pdev) &&
+	    ha->flags.pci_channel_io_perm_failure) {
+		return;
+	}
+
 	ha->isp_ops->disable_intrs(ha);
 
 	/* Perform RISC reset. */
diff --git a/drivers/scsi/qla2xxx/qla_mbx.c b/drivers/scsi/qla2xxx/qla_mbx.c
index 3085c40..7d155b4 100644
--- a/drivers/scsi/qla2xxx/qla_mbx.c
+++ b/drivers/scsi/qla2xxx/qla_mbx.c
@@ -52,20 +52,22 @@ qla2x00_mailbox_command(scsi_qla_host_t *pvha, mbx_cmd_t *mcp)
 
 	DEBUG11(printk("%s(%ld): entered.\n", __func__, pvha->host_no));
 
+	if (ha->flags.pci_channel_io_perm_failure) {
+		DEBUG(printk("%s(%ld): Perm failure on EEH, timeout MBX "
+			     "Exiting.\n", __func__, ha->host_no));
+		return QLA_FUNCTION_TIMEOUT;
+	}
 	/*
 	 * Wait for active mailbox commands to finish by waiting at most tov
 	 * seconds. This is to serialize actual issuing of mailbox cmds during
 	 * non ISP abort time.
 	 */
-	if (!ha->flags.pci_channel_io_perm_failure) {
-
-		if (!wait_for_completion_timeout(&ha->mbx_cmd_comp,
-		    mcp->tov * HZ)) {
-			/* Timeout occurred. Return error. */
-			DEBUG2_3_11(printk("%s(%ld): cmd access timeout. "
+	if (!wait_for_completion_timeout(&ha->mbx_cmd_comp,
+	    mcp->tov * HZ)) {
+		/* Timeout occurred. Return error. */
+		DEBUG2_3_11(printk("%s(%ld): cmd access timeout. "
 			    "Exiting.\n", __func__, ha->host_no));
-			return QLA_FUNCTION_TIMEOUT;
-		}
+		return QLA_FUNCTION_TIMEOUT;
 	}
 
 	ha->flags.mbox_busy = 1;