From: Dave Anderson <anderson@redhat.com> Date: Tue, 15 Apr 2008 17:04:20 -0400 Subject: [x86_64] page faults from user mode are user faults Message-id: 48051854.5020603@redhat.com O-Subject: [RHEL5.2 PATCH] BZ #442101: Oops in 2.6.18-89.el5 kernel while running stress-kernel Bugzilla: 442101 BZ #442101: Oops in 2.6.18-89.el5 kernel while running stress-kernel https://bugzilla.redhat.com/show_bug.cgi?id=442101 This is a nagging bug that has been seen for some time now in RHTS bare-metal x86_64 crashme stress testing. A user-mode page fault is generated by the crashme program -- but the cpu's exception mechanism lays down an error_code on the kernel stack that does not have the PF_USER bit set, essentially violating the cpu exception mechanism protocol. Then, when the bogus virtual address cannot be resolved, the x86_64 do_page_fault() function checks for the PF_USER bit to determine whether to simply send a SIGSEGV to the user process, and if it's not set, to crash the system. On Intel-only x86_64 machines the following occasionally occurs, note that the error_code that was pushed onto the stack is printed as the number following "Oops", and where the CS:RIP are obvious user mode references: Unable to handle kernel paging request at 00000000005967ac RIP: [<0000000011180220>] PGD 1944e067 PUD 3c5b8067 PMD 3cee6067 PTE 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 0 Modules linked in: nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss loop autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev ide_cd i2c_i801 e752x_edac i2c_core floppy edac_mc sg tg3 pcspkr shpchp serio_raw cdrom dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata aic79xx scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 7825, comm: crashme Not tainted 2.6.18-89.el5 #1 RIP: 0033:[<0000000011180220>] [<0000000011180220>] RSP: 002b:00007fffa0f66428 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000009 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000a RBP: 0000000000000007 R08: 00007fffa0f65f20 R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffa0f66588 R14: 0000000000000000 R15: 0000000000000000 FS: 00002b8c09b53210(0000) GS:ffffffff8039e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005967ac CR3: 000000003d2ad000 CR4: 00000000000006e0 Process crashme (pid: 7825, threadinfo ffff81002c05a000, task ffff810016ce30c0) This anomoly was recognized as "CPU buglet" upstream, and fixed in this commit by forcing the PF_USER bit when the page fault was generated from user-mode: commit dbe3ed1c078c193be34326728d494c5c4bc115e2 Author: Linus Torvalds <torvalds@woody.linux-foundation.org> Date: Wed Sep 19 11:37:14 2007 -0700 x86-64: page faults from user mode are always user faults Randy Dunlap noticed an interesting "crashme" behaviour on his dual Prescott Xeon setup, where he gets page faults with the error code having a zero "user" bit, but the register state points back to user mode. This may be a CPU microcode buglet triggered by some strange instruction pattern that crashme generates, and loading a microcode update seems to possibly have fixed it. Regardless, we really should trust the register state more than the error code, since it's really the register state that determines whether we can actually send a signal, or whether we're in kernel mode and need to oops/kill the process in the case of a page fault. Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Selected as a RHEL5.2 blocker; compile-tested only. diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c index c825c20..82542a2 100644 --- a/arch/x86_64/mm/fault.c +++ b/arch/x86_64/mm/fault.c @@ -413,6 +413,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs, if (unlikely(in_atomic() || !mm)) goto bad_area_nosemaphore; + /* + * User-mode registers count as a user access even for any + * potential system fault or CPU buglet. + */ + if (user_mode_vm(regs)) + error_code |= PF_USER; + again: /* When running in the kernel we expect faults to occur only to * addresses in user space. All other faults represent errors in the