From: Rob Evers <revers@redhat.com> Date: Mon, 13 Dec 2010 21:18:00 -0500 Subject: [block] fully zeroize request struct in rq_init Message-id: <20101213211800.12147.8250.sendpatchset@revers-dell.usersys.redhat.com> Patchwork-id: 30185 O-Subject: [RHEL5.6 PATCH V2] blk: fix missing blk_alloc_request memset Bugzilla: 662154 https://bugzilla.redhat.com/show_bug.cgi?id=662154 Enhance block request initialization. Addresses known issues in rhel5 where various members of block requests are uninitialized. Related discussion can be seen at: https://bugzilla.redhat.com/show_bug.cgi?id=516303#c106 Highlights: ---8<---- When the block layer allocates a new I/O request from it's free pool, it calls rq_init() to initialize the request. Unfortunately rq_init() does not initialize rq->retries, so the new request begins with an unreliable value for this field. When the request is passed to the scsi layer, scsi_cmnd->allowed is assigned from req->retries in scsi_setup_blk_pc_cmnd(). Later, in scsi_attempt_requeue_command() the 'wait_for' stack variable uses cmd->allowed to figure out whether the command has timed out : 117 int scsi_attempt_requeue_command(struct scsi_cmnd *cmd, int reason) 118 { 119 struct Scsi_Host *host = cmd->device->host; 120 struct scsi_device *device = cmd->device; 121 struct request_queue *q = device->request_queue; 122 unsigned long wait_for = (cmd->allowed + 1) * cmd->timeout_per_command; 123 unsigned long flags; 124 125 SCSI_LOG_MLQUEUE(1, 126 printk("Inserting command %p into mlqueue\n", cmd)); 127 128 if (time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) { 129 sdev_printk(KERN_ERR, cmd->device, "timing out command, " 130 "waited %lus\n", wait_for/HZ); 131 cmd->result |= DRIVER_TIMEOUT << 24; 132 scsi_finish_command(cmd); 133 return 0; 134 } ... Since cmd->allowed is not reliably initialized, this sometimes trips up and we end up completing the command with a DRIVER_TIMEOUT. The scsi err handler fires up and issues aborts, which causes the low level driver to attempt recovery - the lpfc driver returns DID_REQUEUE, which is a reasonable response. Other drivers handle it slightly differently (e.g. the rash of unexplained qla2xxx abort messages we've seen, etc). From here it gets progressively worse as more and more of the free requests in the block layer free pool get abnormal values for req->retries. Apps doing raw or sg I/O seems to be the trigger for this to start, e.g. oracle. ---8<---- Sanity compiled and tested by booting kernel using rhel5.6 kernel on x86_64. Signed-off-by: Jarod Wilson <jarod@redhat.com> diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 1032232..d2505d6 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -283,25 +283,15 @@ EXPORT_SYMBOL(blk_queue_make_request); static inline void rq_init(request_queue_t *q, struct request *rq) { + memset(rq, 0, sizeof(*rq)); + INIT_LIST_HEAD(&rq->queuelist); INIT_LIST_HEAD(&rq->donelist); - - rq->errors = 0; + rq->sector = rq->hard_sector = (sector_t) -1; rq->rq_status = RQ_ACTIVE; - rq->bio = rq->biotail = NULL; - rq->ioprio = 0; - rq->buffer = NULL; + rq->tag = -1; rq->ref_count = 1; rq->q = q; - rq->waiting = NULL; - rq->special = NULL; - rq->data_len = 0; - rq->data = NULL; - rq->nr_phys_segments = 0; - rq->sense = NULL; - rq->end_io = NULL; - rq->end_io_data = NULL; - rq->completion_data = NULL; } /** @@ -452,9 +442,7 @@ static void queue_flush(request_queue_t *q, unsigned which) rq_init(q, rq); rq->flags = REQ_HARDBARRIER; - rq->elevator_private = NULL; rq->rq_disk = q->bar_rq.rq_disk; - rq->rl = NULL; rq->end_io = end_io; q->prepare_flush_fn(q, rq); @@ -478,8 +466,6 @@ static inline struct request *start_ordered(request_queue_t *q, rq_init(q, rq); rq->flags = bio_data_dir(q->orig_bar_rq->bio); rq->flags |= q->ordered & QUEUE_ORDERED_FUA ? REQ_FUA : 0; - rq->elevator_private = NULL; - rq->rl = NULL; init_request_from_bio(rq, q->orig_bar_rq->bio); rq->end_io = bar_end_io; @@ -2081,6 +2067,7 @@ blk_alloc_request(request_queue_t *q, int flags, struct bio *bio, if (!rq) return NULL; + rq_init(q, rq); /* * first three bits are identical in rq->flags and bio->bi_rw, * see bio.h and blkdev.h @@ -2264,8 +2251,7 @@ rq_starved: */ if (ioc_batching(q, ioc)) ioc->nr_batch_requests--; - - rq_init(q, rq); + rq->rl = rl; trace_block_getrq(q, bio, rw);