Sophie: kernel-2.6.18-194.11.1.el5 src

kernel-2.6.18-194.11.1.el5.src.rpm

From: Dave Anderson <anderson@redhat.com>
Date: Tue, 15 Apr 2008 17:04:20 -0400
Subject: [x86_64] page faults from user mode are user faults
Message-id: 48051854.5020603@redhat.com
O-Subject: [RHEL5.2 PATCH] BZ #442101: Oops in 2.6.18-89.el5 kernel while running stress-kernel
Bugzilla: 442101

BZ #442101: Oops in 2.6.18-89.el5 kernel while running stress-kernel
https://bugzilla.redhat.com/show_bug.cgi?id=442101

This is a nagging bug that has been seen for some time now in RHTS bare-metal
x86_64 crashme stress testing.  A user-mode page fault is generated by
the crashme program -- but the cpu's exception mechanism lays down an error_code
on the kernel stack that does not have the PF_USER bit set, essentially
violating the cpu exception mechanism protocol.  Then, when the bogus virtual
address cannot be resolved, the x86_64 do_page_fault() function checks for the
PF_USER bit to determine whether to simply send a SIGSEGV to the user process,
and if it's not set, to crash the system.

On Intel-only x86_64 machines the following occasionally occurs, note that
the error_code that was pushed onto the stack is printed as the number
following "Oops", and where the CS:RIP are obvious user mode references:

   Unable to handle kernel paging request at 00000000005967ac RIP:
    [<0000000011180220>]
   PGD 1944e067 PUD 3c5b8067 PMD 3cee6067 PTE 0
   Oops: 0000 [1] SMP
   last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
   CPU 0
   Modules linked in: nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss loop
   autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api
   dm_multipath video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug
   ac parport_pc lp parport joydev ide_cd i2c_i801 e752x_edac i2c_core floppy
   edac_mc sg tg3 pcspkr shpchp serio_raw cdrom dm_snapshot dm_zero dm_mirror
   dm_mod ata_piix libata aic79xx scsi_transport_spi sd_mod scsi_mod ext3 jbd
   uhci_hcd ohci_hcd ehci_hcd
   Pid: 7825, comm: crashme Not tainted 2.6.18-89.el5 #1
   RIP: 0033:[<0000000011180220>]  [<0000000011180220>]
   RSP: 002b:00007fffa0f66428  EFLAGS: 00010246
   RAX: 0000000000000000 RBX: 0000000000000009 RCX: ffffffffffffffff
   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000a
   RBP: 0000000000000007 R08: 00007fffa0f65f20 R09: 0000000000000000
   R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000000000
   R13: 00007fffa0f66588 R14: 0000000000000000 R15: 0000000000000000
   FS:  00002b8c09b53210(0000) GS:ffffffff8039e000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 00000000005967ac CR3: 000000003d2ad000 CR4: 00000000000006e0
   Process crashme (pid: 7825, threadinfo ffff81002c05a000, task ffff810016ce30c0)

This anomoly was recognized as "CPU buglet" upstream, and fixed in this commit
by forcing the PF_USER bit when the page fault was generated from user-mode:

   commit dbe3ed1c078c193be34326728d494c5c4bc115e2
   Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
   Date:   Wed Sep 19 11:37:14 2007 -0700

   x86-64: page faults from user mode are always user faults

   Randy Dunlap noticed an interesting "crashme" behaviour on his dual
   Prescott Xeon setup, where he gets page faults with the error code
   having a zero "user" bit, but the register state points back to user
   mode.

   This may be a CPU microcode buglet triggered by some strange instruction
   pattern that crashme generates, and loading a microcode update seems to
   possibly have fixed it.

   Regardless, we really should trust the register state more than the
   error code, since it's really the register state that determines whether
   we can actually send a signal, or whether we're in kernel mode and need
   to oops/kill the process in the case of a page fault.

   Cc: Randy Dunlap <rdunlap@xenotime.net>
   Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Selected as a RHEL5.2 blocker; compile-tested only.

diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index c825c20..82542a2 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -413,6 +413,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 	if (unlikely(in_atomic() || !mm))
 		goto bad_area_nosemaphore;
 
+	/*
+	 * User-mode registers count as a user access even for any
+	 * potential system fault or CPU buglet.
+	 */
+	if (user_mode_vm(regs))
+		error_code |= PF_USER;
+
  again:
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the