From: Milan Broz <mbroz@redhat.com> Date: Mon, 27 Jul 2009 15:08:13 +0200 Subject: [fs] __bio_clone: don't calculate hw/phys segment counts Message-id: 4A6DA6BD.1010505@redhat.com O-Subject: [RHEL 5.4 PATCH REPOST] block: __bio_clone: don't calculate hw/phys segment counts Bugzilla: 512387 RH-Acked-by: Dean Nelson <dnelson@redhat.com> RH-Acked-by: Doug Ledford <dledford@redhat.com> RH-Acked-by: Jerome Marchand <jmarchan@redhat.com> [repost with upstream comment added] RHEL5.4 block: __bio_clone: don't calculate hw/phys segment counts (max_phys_segments violation with dm-linear + md raid1 + cciss) Resolves: rhbz#512387 Patch is upstream, commit 5d84070ee0a433620c57e85dac7f82faaec5fbb3 For stack of devices like cciss->md_raid1->dm_linear there is possibility of miscalculation of segment count in bio. If this happens, cciss driver fails on BUG_ON(creq->nr_phys_segments > MAXSGENTRIES); The calculated number of segments depends (also) on queue restrictions, according to blk_recount_segments() it depends on these queue parameters: q->bounce_pfn q->max_segment_size q->seg_boundary_mask The last two are correctly propagated through stack now (thanks to previous patch from bug 471639) but bounce_pfn can be different: - all dm devices sets BLK_BOUNCE_ANY here - md do not set explicitly this value, so block layer sets BLK_BOUNCE_HIGH - cciss uses explicit setting using dma_mask So in reality, every stack device can have different bounce_pfn! The real problem is when virtual block device (MD or DM) clones bio request using __bio_clone() function - this recalculates segments count for _new_ clone using _old_ block device queue and set BIO_SEG_VALID. DM here correctly resets this flag, so after remapping (i.e. queue change) segments are recalculated again. MD doesn't resets flag. Now, segment count can by recalculated using more restrictive bounce_pfn (so all pages in high memory are counted as new segments because of possible bouncing) and because BIO_SEG_VALID is set, this segment count is _not_ recalculated for underlying device queue later. Original description from upstream commit: If the users sets a new ->bi_bdev on the bio after __bio_clone() has returned it, the "segment counts valid" flag still remains even though it may be different with the new target. So don't calculate segment counts in __bio_clone(). Patch compiled and tested (testcase in bugzilla). diff --git a/fs/bio.c b/fs/bio.c index 3c0127c..d5cd802 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -257,11 +257,13 @@ inline int bio_hw_segments(request_queue_t *q, struct bio *bio) */ void __bio_clone(struct bio *bio, struct bio *bio_src) { - request_queue_t *q = bdev_get_queue(bio_src->bi_bdev); - memcpy(bio->bi_io_vec, bio_src->bi_io_vec, bio_src->bi_max_vecs * sizeof(struct bio_vec)); + /* + * most users will be overriding ->bi_bdev with a new target, + * so we don't set nor calculate new physical/hw segment counts here + */ bio->bi_sector = bio_src->bi_sector; bio->bi_bdev = bio_src->bi_bdev; bio->bi_flags |= 1 << BIO_CLONED; @@ -269,8 +271,6 @@ void __bio_clone(struct bio *bio, struct bio *bio_src) bio->bi_vcnt = bio_src->bi_vcnt; bio->bi_size = bio_src->bi_size; bio->bi_idx = bio_src->bi_idx; - bio_phys_segments(q, bio); - bio_hw_segments(q, bio); } /**