Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > fc11cd6e1c513a17304da94a5390f3cd > files > 3934

kernel-2.6.18-194.11.1.el5.src.rpm

From: Stephen C. Tweedie <sct@redhat.com>
Date: Fri, 28 Mar 2008 16:51:28 +0000
Subject: [x86] fix mprotect on PROT_NONE regions
Message-id: 1206723088.17918.28.camel@sisko.scot.redhat.com
O-Subject: [RHEL-5.2 patch] BZ 437412: Fix mprotect on PROT_NONE regions
Bugzilla: 437412

Hi,

This is a repost, for real this time, of the PTE patch I posted as RFC
yesterday.  I've now got a build for it in brew:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1233668

and have multiple reports that it cures the crash-on-starting-X bug:

        https://bugzilla.redhat.com/show_bug.cgi?id=437412
        Bugzilla Bug 437412: System reboots when X started on dom0

The bug is that starting X crashes the system solid on some hardware
with 5.2.  That's a regression against 5.1.  It occurs when we try to
use mprotect(PROT_NONE) on an ioremapped region, and then turn the
protection back to something readable.

The regression is due to another fix in 5.2, concerning migration of
PROT_NONE ptes.  Basically, when we migrate a PV guest, the hypervisor
is responsible for updating all the guest's ptes to point to the correct
new pages on the destination host.  But it doesn't touch pages which
don't have the PAGE_PRESENT bit.  Trouble is, on x86, PROT_NONE pages
don't have that bit set, so they don't get translated --- memory
corruption!

The fix in 5.2 was to special case PROT_NONE pages so that they are
stored in the page tables using a guest-relative, not hardware, frame
number.  So after migrate, if the guest on the new host turns the pages
back to PROT_READ, they can be correctly re-translated to valid hardware
frame numbers at that point in time.

But this new translating code doesn't work for ioremap()ed memory ---
such pages simply do not have a canonical, host-independent guest-
relative frame number, so we hit a BUG() in the pte handling.

The fix adopted upstream has been to mark ioremap() ptes with a new
_PAGE_IO bit, taken from the x86 host-available bits in the pte, and to
disable the magic frame number translations on ptes with that bit set.

Note, we only have 3 os-available bits in the pte.  Upstream Xen uses
bit 11 in the HV, and bit 9 for this new _PAGE_IO.  On x86_64, however,
we're already using bit 9 in the Stratus dirty-memory-tracking patches,
so I've changed _PAGE_IO patch to use bit 10 instead on 64-bit.  That's
us completely out of free bits on 64-bit, though.

Please ACK.

--Stephen

commit dd649b84b42b60261c093f7d74065378cf49c273
Author: Stephen Tweedie <sct@redhat.com>
Date:   Thu Mar 27 21:35:20 2008 +0000

    Xen: Fix mprotect on ioremaped regions.

    Fixes bug 437412: system crashes when X starts.

    This patch fixes a regression caused by patch
    linux-2.6-xen-save-restore-pv-oops-when-mmap-prot_none.patch.

    When Xen PV guests migrate, the HV updates the hardware frame numbers
    in all valid ptes automatically to reflect the layout of the guest on
    the new host.  But it skips over ptes which are not marked as present.

    On Linux x86, the mmu has no way to make a present page unreadable,
    so to implement PROT_NONE, the mmu clears the present bit.  This
    prevents the hypervisor from translating the ptes correctly on migrate,
    which leaves the ptes pointing to the wrong page on the new host.

    The above patch fixed this by storing the hardware frame numbers
    for PROT_NONE pages as guest-relative pseudopage frame references
    instead.  Unfortunately, that breaks mprotect(PROT_NONE) on
    ioremap()ed memory regions, which do not have any guest-relative
    frame numbers --- such pages are always specific to the host.

    Fix this by introducing a new pte flag, _PAGE_IO, for ioremap()ed
    pages and exempting such ptes from PROT_NONE translation.

    Based on upstream linux-2.6.18-xen.hg changesets 488 and 492:

      xen, x86: Track foreign and I/O mappings with a new pte flag, and do
      not subject such ptes to conversions to/from pseudophysical addresses.

    and

      xen, i386: Fix non-PAE build failure.

    but modified on x86_64 to use the second os-available pte bit, 0x400
    instead of 0x200, for _PAGE_IO --- the bit used upstream is already used
    for _PAGE_SOFTDIRTY in the non-upstream RHEL-specific patch,
    linux-2.6-x86_64-dirty-page-tracking.patch.

    Signed-off-by: Stephen Tweedie <sct@redhat.com>

Acked-by: Chris Lalancette <clalance@redhat.com>
Acked-by: Bill Burns <bburns@redhat.com>

diff --git a/arch/i386/mm/ioremap-xen.c b/arch/i386/mm/ioremap-xen.c
index b9d379a..5f69a4c 100644
--- a/arch/i386/mm/ioremap-xen.c
+++ b/arch/i386/mm/ioremap-xen.c
@@ -76,7 +76,7 @@ static int __direct_remap_pfn_range(struct mm_struct *mm,
 		 * Fill in the machine address: PTE ptr is done later by
 		 * apply_to_page_range(). 
 		 */
-		v->val = pte_val_ma(pfn_pte_ma(mfn, prot));
+		v->val = pte_val_ma(pfn_pte_ma(mfn, prot)) | _PAGE_IO;
 
 		mfn++;
 		address += PAGE_SIZE; 
diff --git a/include/asm-i386/mach-xen/asm/page.h b/include/asm-i386/mach-xen/asm/page.h
index b16b28b..4756ccc 100644
--- a/include/asm-i386/mach-xen/asm/page.h
+++ b/include/asm-i386/mach-xen/asm/page.h
@@ -36,6 +36,7 @@
  * below. The preprocessor will warn if the two definitions aren't identical.
  */
 #define _PAGE_PRESENT	0x001
+#define _PAGE_IO	0x200
 
 #define arch_free_page(_page,_order)			\
 ({	int foreign = PageForeign(_page);		\
@@ -88,8 +89,9 @@ typedef struct { unsigned long long pgd; } pgd_t;
 typedef struct { unsigned long long pgprot; } pgprot_t;
 #define pgprot_val(x)	((x).pgprot)
 #include <asm/maddr.h>
-#define __pte(x) ({ unsigned long long _x = (x);        \
-    if (_x & _PAGE_PRESENT) _x = pte_phys_to_machine(_x);   \
+#define __pte(x) ({ unsigned long long _x = (x);			\
+    if ((_x & (_PAGE_PRESENT|_PAGE_IO)) == _PAGE_PRESENT)		\
+        _x = pte_phys_to_machine(_x);					\
     ((pte_t) {(unsigned long)(_x), (unsigned long)(_x>>32)}); })
 #define __pgd(x) ({ unsigned long long _x = (x); \
     (pgd_t) {((_x) & _PAGE_PRESENT) ? pte_phys_to_machine(_x) : (_x)}; })
@@ -103,7 +105,8 @@ static inline unsigned long long pte_val_ma(pte_t x)
 static inline unsigned long long pte_val(pte_t x)
 {
 	unsigned long long ret = pte_val_ma(x);
-	if (x.pte_low & _PAGE_PRESENT) ret = pte_machine_to_phys(ret);
+	if ((x.pte_low & (_PAGE_PRESENT|_PAGE_IO)) == _PAGE_PRESENT)
+		ret = pte_machine_to_phys(ret);
 	return ret;
 }
 static inline unsigned long long pmd_val(pmd_t x)
@@ -130,13 +133,16 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 #define pgprot_val(x)	((x).pgprot)
 #include <asm/maddr.h>
 #define boot_pte_t pte_t /* or would you rather have a typedef */
-#define pte_val(x)	(((x).pte_low & _PAGE_PRESENT) ? \
-			 machine_to_phys((x).pte_low) : \
+#define pte_val(x)	((((x).pte_low & (_PAGE_PRESENT|_PAGE_IO)) \
+			 == _PAGE_PRESENT) ?			   \
+			 machine_to_phys((x).pte_low) :		   \
 			 (x).pte_low)
 #define pte_val_ma(x)	((x).pte_low)
 #define __pte_val(x)	pte_val_ma(x)
-#define __pte(x) ({ unsigned long _x = (x); \
-    (pte_t) {((_x) & _PAGE_PRESENT) ? phys_to_machine(_x) : (_x)}; })
+#define __pte(x) ({ unsigned long _x = (x);			\
+    if ((_x & (_PAGE_PRESENT|_PAGE_IO)) == _PAGE_PRESENT)	\
+        _x = phys_to_machine(_x);				\
+    ((pte_t) { _x }); })
 #define __pgd(x) ({ unsigned long _x = (x); \
     (pgd_t) {((_x) & _PAGE_PRESENT) ? phys_to_machine(_x) : (_x)}; })
 static inline unsigned long pgd_val(pgd_t x)
diff --git a/include/asm-i386/mach-xen/asm/pgtable-2level.h b/include/asm-i386/mach-xen/asm/pgtable-2level.h
index f7250a3..2694758 100644
--- a/include/asm-i386/mach-xen/asm/pgtable-2level.h
+++ b/include/asm-i386/mach-xen/asm/pgtable-2level.h
@@ -41,8 +41,10 @@
 #define __pte_mfn(_pte) ((_pte).pte_low >> PAGE_SHIFT)
 #define pte_mfn(_pte) ((_pte).pte_low & _PAGE_PRESENT ? \
 	__pte_mfn(_pte) : pfn_to_mfn(__pte_mfn(_pte)))
-#define pte_pfn(_pte) ((_pte).pte_low & _PAGE_PRESENT ? \
-	mfn_to_local_pfn(__pte_mfn(_pte)) : __pte_mfn(_pte))
+#define pte_pfn(_pte) ((_pte).pte_low & _PAGE_IO ? max_mapnr :	\
+		       (_pte).pte_low & _PAGE_PRESENT ?		\
+		       mfn_to_local_pfn(__pte_mfn(_pte)) :	\
+		       __pte_mfn(_pte))
 
 #define pte_page(_pte) pfn_to_page(pte_pfn(_pte))
 
diff --git a/include/asm-i386/mach-xen/asm/pgtable-3level.h b/include/asm-i386/mach-xen/asm/pgtable-3level.h
index 8747c52..84f787f 100644
--- a/include/asm-i386/mach-xen/asm/pgtable-3level.h
+++ b/include/asm-i386/mach-xen/asm/pgtable-3level.h
@@ -149,8 +149,10 @@ static inline int pte_none(pte_t pte)
 			 ((_pte).pte_high << (32-PAGE_SHIFT)))
 #define pte_mfn(_pte) ((_pte).pte_low & _PAGE_PRESENT ? \
 	__pte_mfn(_pte) : pfn_to_mfn(__pte_mfn(_pte)))
-#define pte_pfn(_pte) ((_pte).pte_low & _PAGE_PRESENT ? \
-	mfn_to_local_pfn(__pte_mfn(_pte)) : __pte_mfn(_pte))
+#define pte_pfn(_pte) ((_pte).pte_low & _PAGE_IO ? max_mapnr :	\
+		       (_pte).pte_low & _PAGE_PRESENT ?		\
+		       mfn_to_local_pfn(__pte_mfn(_pte)) :	\
+		       __pte_mfn(_pte))
 
 extern unsigned long long __supported_pte_mask;
 
diff --git a/include/asm-i386/mach-xen/asm/pgtable.h b/include/asm-i386/mach-xen/asm/pgtable.h
index 8e03275..05b6e71 100644
--- a/include/asm-i386/mach-xen/asm/pgtable.h
+++ b/include/asm-i386/mach-xen/asm/pgtable.h
@@ -130,9 +130,12 @@ void paging_init(void);
 #define _PAGE_NX	0
 #endif
 
+/* Mapped page is I/O or foreign and has no associated page struct. */
+#define _PAGE_IO	0x200
+
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_IO)
 
 #define PAGE_NONE \
 	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
diff --git a/include/asm-x86_64/mach-xen/asm/page.h b/include/asm-x86_64/mach-xen/asm/page.h
index 50ad81d..084a3bc 100644
--- a/include/asm-x86_64/mach-xen/asm/page.h
+++ b/include/asm-x86_64/mach-xen/asm/page.h
@@ -18,6 +18,7 @@
  * below. The preprocessor will warn if the two definitions aren't identical.
  */
 #define _PAGE_PRESENT	0x001
+#define _PAGE_IO	0x400
 
 #define arch_free_page(_page,_order)			\
 ({	int foreign = PageForeign(_page);		\
@@ -101,8 +102,9 @@ typedef struct { unsigned long pgd; } pgd_t;
 
 typedef struct { unsigned long pgprot; } pgprot_t;
 
-#define pte_val(x)	(((x).pte & _PAGE_PRESENT) ? \
-			 pte_machine_to_phys((x).pte) : \
+#define pte_val(x)	((((x).pte & (_PAGE_PRESENT|_PAGE_IO)) \
+			  == _PAGE_PRESENT) ?		       \
+			 pte_machine_to_phys((x).pte) :	       \
 			 (x).pte)
 #define pte_val_ma(x)	((x).pte)
 
@@ -135,7 +137,8 @@ static inline unsigned long pgd_val(pgd_t x)
 
 static inline pte_t __pte(unsigned long x)
 {
-	if (x & _PAGE_PRESENT) x = pte_phys_to_machine(x);
+	if ((x & (_PAGE_PRESENT|_PAGE_IO)) == _PAGE_PRESENT)
+		x = pte_phys_to_machine(x);
 	return ((pte_t) { (x) });
 }
 
diff --git a/include/asm-x86_64/mach-xen/asm/pgtable.h b/include/asm-x86_64/mach-xen/asm/pgtable.h
index bff95f2..f5cfe8e 100644
--- a/include/asm-x86_64/mach-xen/asm/pgtable.h
+++ b/include/asm-x86_64/mach-xen/asm/pgtable.h
@@ -229,10 +229,13 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long
 #define _PAGE_PROTNONE	0x080	/* If not present */
 #define _PAGE_NX        (_AC(1,UL)<<_PAGE_BIT_NX)
 
+/* Mapped page is I/O or foreign and has no associated page struct. */
+#define _PAGE_IO	0x400
+
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
 
-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_SOFTDIRTY)
+#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_SOFTDIRTY | _PAGE_IO)
 
 #define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_NX)
@@ -325,8 +328,10 @@ static inline unsigned long pud_bad(pud_t pud)
 #define __pte_mfn(_pte) (((_pte).pte & PTE_MASK) >> PAGE_SHIFT)
 #define pte_mfn(_pte) ((_pte).pte & _PAGE_PRESENT ? \
 	__pte_mfn(_pte) : pfn_to_mfn(__pte_mfn(_pte)))
-#define pte_pfn(_pte) ((_pte).pte & _PAGE_PRESENT ? \
-	mfn_to_local_pfn(__pte_mfn(_pte)) : __pte_mfn(_pte))
+#define pte_pfn(_pte) ((_pte).pte & _PAGE_IO ? end_pfn :	\
+		       (_pte).pte & _PAGE_PRESENT ?		\
+		       mfn_to_local_pfn(__pte_mfn(_pte)) :	\
+		       __pte_mfn(_pte))
 
 #define pte_page(x)	pfn_to_page(pte_pfn(x))