From: Rob Evers <revers@redhat.com> Date: Fri, 16 Jan 2009 14:36:00 -0500 Subject: [scsi] no-sense msgs, data corruption, but no i/o errors Message-id: 20090116193503.12714.37477.sendpatchset@localhost.localdomain O-Subject: [RHEL5.4 PATCH] scsi no-sense messages, data corruption, but no i/o errors Bugzilla: 468088 RH-Acked-by: Mike Christie <mchristi@redhat.com> RH-Acked-by: Doug Ledford <dledford@redhat.com> RH-Acked-by: Tomas Henzl <thenzl@redhat.com> bugzilla number: https://bugzilla.redhat.com/show_bug.cgi?id=468088 description: Quoted from the commit in Linus's tree: The current handling of NO_SENSE check condition is the same as RECOVERED_ERROR, and assumes that in both cases, the I/O was fully transferred. We have seen cases of arrays returning with NO_SENSE (no error), but the I/O was not completely transferred, thus residual set. Thus, rather than return good_bytes as the entire transfer, set good_bytes to 0, so that the midlayer then applies the residual in calculating the transfer, and for sd, will fail the I/O and fall into a retry path. Testing: The scsi_debug.c driver was modified to recreate the scenario. Periodically a fault was injected into the driver handling of scsi requests. The fault scenario was that a check condtion status is returned with sense = 'no sense'. Addtionally, during the fault injection, read data was corrupted. Before the patch was applied, dt was used to detect that data corruption was occurring, when the faults were being injected. After the patch was applied, the same test scenario resulted in no data corruption. Console messages are evident in both cases indicating that a 'no sense' sense code is present during fault injection. Upstream Status: This patch was accepted into Linus' kernel (2.6.28) in October 2008. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 32e5a09..4fd6e68 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1015,7 +1015,6 @@ static void sd_rw_intr(struct scsi_cmnd * SCpnt) good_bytes = (bad_lba - start_lba)*SCpnt->device->sector_size; break; case RECOVERED_ERROR: - case NO_SENSE: /* Inform the user, but make sure that it's not treated * as a hard error. */ @@ -1024,6 +1023,15 @@ static void sd_rw_intr(struct scsi_cmnd * SCpnt) memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); good_bytes = xfer_size; break; + case NO_SENSE: + /* This indicates a false check condition, so ignore it. An + * unknown amount of data was transferred so treat it as an + * error. + */ + scsi_print_sense("sd", SCpnt); + SCpnt->result = 0; + memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); + break; case ILLEGAL_REQUEST: if (SCpnt->device->use_10_for_rw && (SCpnt->cmnd[0] == READ_10 ||