Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 3581

kernel-2.6.18-194.11.1.el5.src.rpm

From: Marcus Barrow <mbarrow@redhat.com>
Date: Wed, 8 Apr 2009 00:09:39 -0400
Subject: [scsi] qla2xxx: reduce DID_BUS_BUSY failover errors
Message-id: 20090408040939.12343.33337.sendpatchset@file.bos.redhat.com
O-Subject: [rhel 5.4 patch] qla2xxx : Reduce DID_BUS_BUSY errors causing failover
Bugzilla: 244967
RH-Acked-by: Mike Christie <mchristi@redhat.com>

BZ 244967 Frequent path failures during I/O on DM multipath devices

This patch changes the driver to reduce the number of conditions it
returns DID_BUS_BUSY for. That error is very serious and causes path
failovers. Errors caused by a dropped frame are responded to with
DID_ERROR causing a retry to occur.

This patch applies and builds cleanly with 2.6.18-137. It is tested
at QLogic.

qla2xxx - reduce use of DID_BUS_BUSY

Instead of BUS_BUSY return TRANSPORT_DISRUPTED or DID_ERROR.

Use DID_ERROR for dropped frame on CS_UNDERRUN instead of
DID_BUSY

With DID_TRANSPORT_DISRUPTED usage, IO will not fail until fast IO
fail fires. Or if fast IO fail is not set IO will fail with the dev
loss tmo is fired. This may change the behavior, the users would have
to set the fast IO fail to get the IO fail quickly, than to wait for the
dev loss tmo to fire.

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 7901e24..554d77f 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1185,7 +1185,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 			/*
 			 * If RISC reports underrun and target does not report
 			 * it then we must have a lost frame, so tell upper
-			 * layer to retry it by reporting a bus busy.
+			 * layer to retry it by reporting a did error.
 			 */
 			if (!(scsi_status & SS_RESIDUAL_UNDER)) {
 				DEBUG2(printk("scsi(%ld:%d:%d:%d) Dropped "
@@ -1195,7 +1195,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 					      cp->device->lun, resid,
 					      scsi_bufflen(cp)));
 
-				cp->result = DID_BUS_BUSY << 16;
+				cp->result = DID_ERROR << 16;
 				break;
 			}
 
@@ -1252,7 +1252,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 		    cp->serial_number, comp_status,
 		    atomic_read(&fcport->state)));
 
-		cp->result = DID_BUS_BUSY << 16;
+		cp->result = DID_TRANSPORT_DISRUPTED << 16;
 		if (atomic_read(&fcport->state) == FCS_ONLINE) {
 			qla2x00_mark_device_lost(ha, fcport, 1, 1);
 		}
@@ -1280,7 +1280,7 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 		break;
 
 	case CS_TIMEOUT:
-		cp->result = DID_BUS_BUSY << 16;
+		cp->result = DID_TRANSPORT_DISRUPTED << 16;
 
 		if (IS_FWI2_CAPABLE(ha)) {
 			DEBUG2(printk(KERN_INFO