From: Doug Ledford <dledford@redhat.com> Date: Tue, 29 Jan 2008 15:00:11 -0500 Subject: [md] fix raid1 consistency check Message-id: 1201636811.28486.78.camel@firewall.xsintricity.com O-Subject: [Patch RHEL5.2] Fix raid1 consistency check Bugzilla: 429747 This is for bz429747 which has exception status. Basically, when you invoke a check or a repair on a raid1 array, the md code is supposed to read all blocks on all devices, and when it finds any bad blocks on a device, it should write known good data from another device over the bad device, thereby triggering the hard drive's write reallocation feature and restoring the block. It's currently failing to do so, and so a bad block it finds in one pass, will still exist if you run it again. This patch resolves that issue. It's comprised of three upstream git commits, listed in chronological order. Stratus tested a slightly different version of this patch and it solved their problem. I've asked them to rerun their tests with this patch just to verify it also solves the problem (although the only difference between their patch and this one is that I picked up the third git commit in this list and they didn't, and that commit merely makes the check pass fix problems like a repair pass does). commit 3eda22d19b76b15ef3420b251bd47a0ba0127589 Author: NeilBrown <neilb@suse.de> Date: Fri Jan 26 00:57:01 2007 -0800 [PATCH] md: make 'repair' actually work for raid1 When 'repair' finds a block that is different one the various parts of the mirror. it is meant to write a chosen good version to the others. However it currently writes out the original data to each. The memcpy to make all the data the same is missing. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3b4d69c..ccd24bf 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1204,20 +1204,28 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio) } r1_bio->read_disk = primary; for (i=0; i<mddev->raid_disks; i++) - if (r1_bio->bios[i]->bi_end_io == end_sync_read && - test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) { + if (r1_bio->bios[i]->bi_end_io == end_sync_read) { int j; int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9); struct bio *pbio = r1_bio->bios[primary]; struct bio *sbio = r1_bio->bios[i]; - for (j = vcnt; j-- ; ) - if (memcmp(page_address(pbio->bi_io_vec[j].bv_page), - page_address(sbio->bi_io_vec[j].bv_page), - PAGE_SIZE)) - break; + + if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) { + for (j = vcnt; j-- ; ) { + struct page *p, *s; + p = pbio->bi_io_vec[j].bv_page; + s = sbio->bi_io_vec[j].bv_page; + if (memcmp(page_address(p), + page_address(s), + PAGE_SIZE)) + break; + } + } else + j = 0; if (j >= 0) mddev->resync_mismatches += r1_bio->sectors; - if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { + if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery) + && test_bit(BIO_UPTODATE, &sbio->bi_flags))) { sbio->bi_end_io = NULL; rdev_dec_pending(conf->mirrors[i].rdev, mddev); } else { @@ -1235,6 +1243,11 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio) sbio->bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset; sbio->bi_bdev = conf->mirrors[i].rdev->bdev; + for (j = 0; j < vcnt ; j++) + memcpy(page_address(sbio->bi_io_vec[j].bv_page), + page_address(pbio->bi_io_vec[j].bv_page), + PAGE_SIZE); + } } }