Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 839

kernel-2.6.18-238.el5.src.rpm

From: Milan Broz <mbroz@redhat.com>
Date: Mon, 27 Jul 2009 15:08:13 +0200
Subject: [fs] __bio_clone: don't calculate hw/phys segment counts
Message-id: 4A6DA6BD.1010505@redhat.com
O-Subject: [RHEL 5.4 PATCH REPOST] block: __bio_clone: don't calculate hw/phys segment counts
Bugzilla: 512387
RH-Acked-by: Dean Nelson <dnelson@redhat.com>
RH-Acked-by: Doug Ledford <dledford@redhat.com>
RH-Acked-by: Jerome Marchand <jmarchan@redhat.com>

[repost with upstream comment added]

RHEL5.4 block: __bio_clone: don't calculate hw/phys segment counts
(max_phys_segments violation with dm-linear + md raid1 + cciss)

Resolves: rhbz#512387

Patch is upstream, commit 5d84070ee0a433620c57e85dac7f82faaec5fbb3

For stack of devices like cciss->md_raid1->dm_linear
there is possibility of miscalculation of segment count in bio.

If this happens, cciss driver fails on
  BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

The calculated number of segments depends (also) on queue restrictions,
according to blk_recount_segments() it depends on these queue parameters:

    q->bounce_pfn
    q->max_segment_size
    q->seg_boundary_mask

The last two are correctly propagated through stack now (thanks to previous
patch from bug 471639) but bounce_pfn can be different:

 - all dm devices sets BLK_BOUNCE_ANY here
 - md do not set explicitly this value, so block layer sets BLK_BOUNCE_HIGH
 - cciss uses explicit setting using dma_mask

So in reality, every stack device can have different bounce_pfn!

The real problem is when virtual block device (MD or DM) clones bio request
using __bio_clone() function - this recalculates segments count for _new_ clone
using _old_ block device queue and set BIO_SEG_VALID.

DM here correctly resets this flag, so after remapping (i.e. queue change)
segments are recalculated again.
MD doesn't resets flag.

Now, segment count can by recalculated using more restrictive bounce_pfn (so
all pages in high memory are counted as new segments because of possible
bouncing) and because BIO_SEG_VALID is set, this segment count is _not_
recalculated for underlying device queue later.

Original description from upstream commit:

 If the users sets a new ->bi_bdev on the bio after __bio_clone() has
 returned it, the "segment counts valid" flag still remains even though
 it may be different with the new target. So don't calculate segment
 counts in __bio_clone().

Patch compiled and tested (testcase in bugzilla).

diff --git a/fs/bio.c b/fs/bio.c
index 3c0127c..d5cd802 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -257,11 +257,13 @@ inline int bio_hw_segments(request_queue_t *q, struct bio *bio)
  */
 void __bio_clone(struct bio *bio, struct bio *bio_src)
 {
-	request_queue_t *q = bdev_get_queue(bio_src->bi_bdev);
-
 	memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
 		bio_src->bi_max_vecs * sizeof(struct bio_vec));
 
+	/*
+	 * most users will be overriding ->bi_bdev with a new target,
+	 * so we don't set nor calculate new physical/hw segment counts here
+	 */
 	bio->bi_sector = bio_src->bi_sector;
 	bio->bi_bdev = bio_src->bi_bdev;
 	bio->bi_flags |= 1 << BIO_CLONED;
@@ -269,8 +271,6 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 	bio->bi_vcnt = bio_src->bi_vcnt;
 	bio->bi_size = bio_src->bi_size;
 	bio->bi_idx = bio_src->bi_idx;
-	bio_phys_segments(q, bio);
-	bio_hw_segments(q, bio);
 }
 
 /**