Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 1686

kernel-2.6.18-238.el5.src.rpm

Subject: Re: [2.6.17-git22] lock debugging output
From:	Arjan van de Ven <arjan@infradead.org>
To:	Alessandro Suardi <alessandro.suardi@gmail.com>
Cc:	akpm@osdl.org, mingo@elte.hu,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org
In-Reply-To: <5a4c581d0607041113o2993cbf5m7011b2a06e96d974@mail.gmail.com>
References: <5a4c581d0607041113o2993cbf5m7011b2a06e96d974@mail.gmail.com>
Content-Type: text/plain
Date:	Tue, 04 Jul 2006 20:32:46 +0200

From: Arjan van de Ven <arjan@linux.intel.com>

On Tue, 2006-07-04 at 20:13 +0200, Alessandro Suardi wrote:
> Hoping gmail doesn't mess it too badly...
> 
> eth0: tg3 (BCM5751 Gbit Ethernet)
> eth1: ipw2200 (Intel PRO/Wireless 2200BG)
> 
> Sequence:
>  1. boot with eth0 disconnected (eth1 doesn't come up on boot)
>  2. ifup eth1, bring wpa-supplicant up
>  3. run 'dig' ---> <lock debug info gets printed on console>


this appears to be a real deadlock:

the SO_BINDTODEVICE ioctl calls sk_dst_reset(), which looks like this:
static inline void
sk_dst_reset(struct sock *sk)
{
        write_lock(&sk->sk_dst_lock);
        __sk_dst_reset(sk);
        write_unlock(&sk->sk_dst_lock);
}

now... ipv6 does this in softirq context:
  [<c028cf72>] sk_dst_check+0x1b/0xe6
  [<f8ce1305>] ip6_dst_lookup+0x31/0x16d [ipv6]
  [<f8cf3338>] icmpv6_send+0x332/0x549 [ipv6]
  [<f8cf09a1>] udpv6_rcv+0x4ab/0x4d6 [ipv6]
  [<f8ce2900>] ip6_input+0x19c/0x228 [ipv6]
  [<f8ce2d61>] ipv6_rcv+0x188/0x1b7 [ipv6]
  [<c02925b7>] netif_receive_skb+0x18d/0x1d8
  [<c0293d6a>] process_backlog+0x80/0xf9
  [<c0293f43>] net_rx_action+0x80/0x174
  [<c01162fd>] __do_softirq+0x46/0x9c
  [<c01040e6>] do_softirq+0x4d/0xac

where sk_dst_check() takes the same lock for read.

that looks like a real deadlock to me... 
the most obvious low impact solution is to make sk_dst_reset use an
irqsave variant; patch for that is attached below. I'll leave it to the
networking people to say if that's the real right approach

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

---
 include/net/sock.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6.17-mm6/include/net/sock.h
===================================================================
--- linux-2.6.17-mm6.orig/include/net/sock.h
+++ linux-2.6.17-mm6/include/net/sock.h
@@ -1025,9 +1025,10 @@ __sk_dst_reset(struct sock *sk)
 static inline void
 sk_dst_reset(struct sock *sk)
 {
-	write_lock(&sk->sk_dst_lock);
+	unsigned long flags;
+	write_lock_irqsave(&sk->sk_dst_lock, flags);
 	__sk_dst_reset(sk);
-	write_unlock(&sk->sk_dst_lock);
+	write_unlock_irqrestore(&sk->sk_dst_lock, flags);
 }
 
 extern struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Subject: Re: lockdep input layer warnings.
From: Arjan van de Ven <arjan@infradead.org>
To: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Dave Jones <davej@redhat.com>, mingo@redhat.com,
        Linux Kernel <linux-kernel@vger.kernel.org>
In-Reply-To: <d120d5000607061329t4868d265h6f8285c798a0e3b7@mail.gmail.com>
References: <20060706173411.GA2538@redhat.com>
	 <d120d5000607061137r605a08f9ie6cd45a389285c4a@mail.gmail.com>
	 <1152212575.3084.88.camel@laptopd505.fenrus.org>
	 <d120d5000607061329t4868d265h6f8285c798a0e3b7@mail.gmail.com>
Content-Type: text/plain
Date: Mon, 10 Jul 2006 17:12:51 +0200

On Thu, 2006-07-06 at 16:29 -0400, Dmitry Torokhov wrote:
> On 7/6/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > On Thu, 2006-07-06 at 14:37 -0400, Dmitry Torokhov wrote:
> > > On 7/6/06, Dave Jones <davej@redhat.com> wrote:
> > > > One of our Fedora-devel users picked up on this this morning
> > > > in an 18rc1 based kernel.
> > > >
> > > >                Dave
> > > >
> > > >
> > > >  Synaptics Touchpad, model: 1, fw: 5.9, id: 0x2c6ab1, caps: 0x884793/0x0
> > > >  serio: Synaptics pass-through port at isa0060/serio1/input0
> > > >  input: SynPS/2 Synaptics TouchPad as /class/input/input1
> > > >  PM: Adding info for serio:serio2
> > > >
> > > >  =============================================
> > > >  [ INFO: possible recursive locking detected ]
> > > >  ---------------------------------------------
> > >
> > > False alarm, there was a lockdep annotating patch for it in -mm.
> > not so sure; that patch is supposed to be in -rc1 already; investigating
> >
> 
> Well, you are right, the patch is in -rc1 and I see mutex_lock_nested
> in the backtrace but for some reason it is still not happy. Again,
> this is with pass-through Synaptics port and we first taking mutex of
> the child device and then (going through pass-through port) trying to
> take mutex of the parent.

Ok it seems more drastic measures are needed; and a split of the
cmd_mutex class on a per driver basis. The easiest way to do that is to
inline the lock initialization (patch below) but to be honest I think
the patch is a bit ugly; I considered inlining the entire function
instead, any opinions on that?

Index: linux-2.6.18-rc1/drivers/input/serio/libps2.c
===================================================================
--- linux-2.6.18-rc1.orig/drivers/input/serio/libps2.c
+++ linux-2.6.18-rc1/drivers/input/serio/libps2.c
@@ -27,7 +27,7 @@ MODULE_AUTHOR("Dmitry Torokhov <dtor@mai
 MODULE_DESCRIPTION("PS/2 driver library");
 MODULE_LICENSE("GPL");
 
-EXPORT_SYMBOL(ps2_init);
+EXPORT_SYMBOL(__ps2_init);
 EXPORT_SYMBOL(ps2_sendbyte);
 EXPORT_SYMBOL(ps2_drain);
 EXPORT_SYMBOL(ps2_command);
@@ -177,7 +177,7 @@ int ps2_command(struct ps2dev *ps2dev, u
 		return -1;
 	}
 
-	mutex_lock_nested(&ps2dev->cmd_mutex, SINGLE_DEPTH_NESTING);
+	mutex_lock(&ps2dev->cmd_mutex);
 
 	serio_pause_rx(ps2dev->serio);
 	ps2dev->flags = command == PS2_CMD_GETID ? PS2_FLAG_WAITID : 0;
@@ -279,7 +279,7 @@ int ps2_schedule_command(struct ps2dev *
  * ps2_init() initializes ps2dev structure
  */
 
-void ps2_init(struct ps2dev *ps2dev, struct serio *serio)
+void __ps2_init(struct ps2dev *ps2dev, struct serio *serio)
 {
 	mutex_init(&ps2dev->cmd_mutex);
 	init_waitqueue_head(&ps2dev->wait);
Index: linux-2.6.18-rc1/include/linux/libps2.h
===================================================================
--- linux-2.6.18-rc1.orig/include/linux/libps2.h
+++ linux-2.6.18-rc1/include/linux/libps2.h
@@ -39,7 +39,12 @@ struct ps2dev {
 	unsigned char nak;
 };
 
-void ps2_init(struct ps2dev *ps2dev, struct serio *serio);
+void __ps2_init(struct ps2dev *ps2dev, struct serio *serio);
+static inline void ps2_init(struct ps2dev *ps2dev, struct serio *serio)
+{
+	__ps2_init(ps2dev, serio);
+	mutex_init(&ps2dev->cmd_mutex);
+}
 int ps2_sendbyte(struct ps2dev *ps2dev, unsigned char byte, int timeout);
 void ps2_drain(struct ps2dev *ps2dev, int maxbytes, int timeout);
 int ps2_command(struct ps2dev *ps2dev, unsigned char *param, int command);


Subject: Re: another networking lockdep bug
From: Arjan van de Ven <arjan@infradead.org>
To: Dave Jones <davej@redhat.com>
Cc: mingo@elte.hu
In-Reply-To: <20060713040715.GE4199@redhat.com>
References: <20060713040715.GE4199@redhat.com>
Content-Type: text/plain
Date: Thu, 13 Jul 2006 22:29:03 +0200

On Thu, 2006-07-13 at 00:07 -0400, Dave Jones wrote:
> Not sure if this one got reported/fixed yet, as I was running
> a kernel from sometime last week..
> 
> 		Dave
> 


can you add this patch for this and retry?
Index: linux-2.6.18-rc1/net/socket.c
===================================================================
--- linux-2.6.18-rc1.orig/net/socket.c
+++ linux-2.6.18-rc1/net/socket.c
@@ -1232,7 +1232,13 @@ int sock_create(int family, int type, in
 
 int sock_create_kern(int family, int type, int protocol, struct socket **res)
 {
-	return __sock_create(family, type, protocol, res, 1);
+	static struct lock_class_key sk_lock_internal_key;
+	int ret;
+	ret = __sock_create(family, type, protocol, res, 1);
+	if (!ret)
+		lockdep_set_class(&(*res)->sk->sk_lock.slock,
+        		&sk_lock_internal_key);
+        return ret;
 }
 
 asmlinkage long sys_socket(int family, int type, int protocol)


--- a/kernel/lockdep.c~lockdep-print-kernel-version
+++ a/kernel/lockdep.c
@@ -36,6 +36,7 @@
 #include <linux/stacktrace.h>
 #include <linux/debug_locks.h>
 #include <linux/irqflags.h>
+#include <linux/utsname.h>
 
 #include <asm/sections.h>
 
@@ -508,6 +509,13 @@ print_circular_bug_entry(struct lock_lis
 	return 0;
 }
 
+static void print_kernel_version(void)
+{
+	printk("%s %.*s\n", system_utsname.release,
+		(int)strcspn(system_utsname.version, " "),
+		system_utsname.version);
+}
+
 /*
  * When a circular dependency is detected, print the
  * header first:
@@ -524,6 +532,7 @@ print_circular_bug_header(struct lock_li
 
 	printk("\n=======================================================\n");
 	printk(  "[ INFO: possible circular locking dependency detected ]\n");
+	print_kernel_version();
 	printk(  "-------------------------------------------------------\n");
 	printk("%s/%d is trying to acquire lock:\n",
 		curr->comm, curr->pid);
@@ -705,6 +714,7 @@ print_bad_irq_dependency(struct task_str
 	printk("\n======================================================\n");
 	printk(  "[ INFO: %s-safe -> %s-unsafe lock order detected ]\n",
 		irqclass, irqclass);
+	print_kernel_version();
 	printk(  "------------------------------------------------------\n");
 	printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
 		curr->comm, curr->pid,
@@ -786,6 +796,7 @@ print_deadlock_bug(struct task_struct *c
 
 	printk("\n=============================================\n");
 	printk(  "[ INFO: possible recursive locking detected ]\n");
+	print_kernel_version();
 	printk(  "---------------------------------------------\n");
 	printk("%s/%d is trying to acquire lock:\n",
 		curr->comm, curr->pid);
@@ -1368,6 +1379,7 @@ print_irq_inversion_bug(struct task_stru
 
 	printk("\n=========================================================\n");
 	printk(  "[ INFO: possible irq lock inversion dependency detected ]\n");
+	print_kernel_version();
 	printk(  "---------------------------------------------------------\n");
 	printk("%s/%d just changed the state of lock:\n",
 		curr->comm, curr->pid);
@@ -1462,6 +1474,7 @@ print_usage_bug(struct task_struct *curr
 
 	printk("\n=================================\n");
 	printk(  "[ INFO: inconsistent lock state ]\n");
+	print_kernel_version();
 	printk(  "---------------------------------\n");
 
 	printk("inconsistent {%s} -> {%s} usage.\n",
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

while doing a kernel make modules_install install over an NFS mount.
(

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
nfsd/9550 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

other info that might help us debug this:
2 locks held by nfsd/9550:
 #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
 #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103508>] show_trace_log_lvl+0x58/0x152
 [<c0103b8b>] show_trace+0xd/0x10
 [<c0103c2f>] dump_stack+0x19/0x1b
 [<c012aa57>] __lock_acquire+0x77a/0x9a3
 [<c012af4a>] lock_acquire+0x60/0x80
 [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
 [<c034c845>] mutex_lock+0x1c/0x1f
 [<c0162edc>] vfs_unlink+0x34/0x8a
 [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
 [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
 [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
 [<c033e84d>] svc_process+0x3a5/0x5ed
 [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
 [<c0101005>] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
 [<c0103b8b>] show_trace+0xd/0x10
 [<c0103c2f>] dump_stack+0x19/0x1b
 [<c012aa57>] __lock_acquire+0x77a/0x9a3
 [<c012af4a>] lock_acquire+0x60/0x80
 [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
 [<c034c845>] mutex_lock+0x1c/0x1f
 [<c0162edc>] vfs_unlink+0x34/0x8a
 [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
 [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
 [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
 [<c033e84d>] svc_process+0x3a5/0x5ed
 [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
 [<c0101005>] kernel_thread_helper+0x5/0xb

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
nfsd/9580 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

other info that might help us debug this:
2 locks held by nfsd/9580:
 #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
 #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103508>] show_trace_log_lvl+0x58/0x152
 [<c0103b8b>] show_trace+0xd/0x10
 [<c0103c2f>] dump_stack+0x19/0x1b
 [<c012aa63>] __lock_acquire+0x77a/0x9a3
 [<c012af56>] lock_acquire+0x60/0x80
 [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
 [<c034cc1d>] mutex_lock+0x1c/0x1f
 [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
 [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
 [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
 [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
 [<c033ec1d>] svc_process+0x3a5/0x5ed
 [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
 [<c0101005>] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
 [<c0103b8b>] show_trace+0xd/0x10
 [<c0103c2f>] dump_stack+0x19/0x1b
 [<c012aa63>] __lock_acquire+0x77a/0x9a3
 [<c012af56>] lock_acquire+0x60/0x80
 [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
 [<c034cc1d>] mutex_lock+0x1c/0x1f
 [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
 [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
 [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
 [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
 [<c033ec1d>] svc_process+0x3a5/0x5ed
 [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
 [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 fs/nfsd/vfs.c              |    8 ++++----
 include/linux/nfsd/nfsfh.h |   11 +++++++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff -puN fs/nfsd/vfs.c~nfsd-lockdep-annotation fs/nfsd/vfs.c
--- a/fs/nfsd/vfs.c~nfsd-lockdep-annotation
+++ a/fs/nfsd/vfs.c
@@ -1114,7 +1114,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
 	 */
 	if (!resfhp->fh_dentry) {
 		/* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */
-		fh_lock(fhp);
+		fh_lock_nested(fhp, I_MUTEX_PARENT);
 		dchild = lookup_one_len(fname, dentry, flen);
 		err = PTR_ERR(dchild);
 		if (IS_ERR(dchild))
@@ -1240,7 +1240,7 @@ nfsd_create_v3(struct svc_rqst *rqstp, s
 	err = nfserr_notdir;
 	if(!dirp->i_op || !dirp->i_op->lookup)
 		goto out;
-	fh_lock(fhp);
+	fh_lock_nested(fhp, I_MUTEX_PARENT);
 
 	/*
 	 * Compose the response file handle.
@@ -1494,7 +1494,7 @@ nfsd_link(struct svc_rqst *rqstp, struct
 	if (isdotent(name, len))
 		goto out;
 
-	fh_lock(ffhp);
+	fh_lock_nested(ffhp, I_MUTEX_PARENT);
 	ddir = ffhp->fh_dentry;
 	dirp = ddir->d_inode;
 
@@ -1644,7 +1644,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 	if (err)
 		goto out;
 
-	fh_lock(fhp);
+	fh_lock_nested(fhp, I_MUTEX_PARENT);
 	dentry = fhp->fh_dentry;
 	dirp = dentry->d_inode;
 
diff -puN include/linux/nfsd/nfsfh.h~nfsd-lockdep-annotation include/linux/nfsd/nfsfh.h
--- a/include/linux/nfsd/nfsfh.h~nfsd-lockdep-annotation
+++ a/include/linux/nfsd/nfsfh.h
@@ -290,8 +290,9 @@ fill_post_wcc(struct svc_fh *fhp)
  * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once
  * so, any changes here should be reflected there.
  */
+
 static inline void
-fh_lock(struct svc_fh *fhp)
+fh_lock_nested(struct svc_fh *fhp, unsigned int subclass)
 {
 	struct dentry	*dentry = fhp->fh_dentry;
 	struct inode	*inode;
@@ -310,11 +311,17 @@ fh_lock(struct svc_fh *fhp)
 	}
 
 	inode = dentry->d_inode;
-	mutex_lock(&inode->i_mutex);
+	mutex_lock_nested(&inode->i_mutex, subclass);
 	fill_pre_wcc(fhp);
 	fhp->fh_locked = 1;
 }
 
+static inline void
+fh_lock(struct svc_fh *fhp)
+{
+	fh_lock_nested(fhp, I_MUTEX_NORMAL);
+}
+
 /*
  * Unlock a file handle/inode
  */
_
From: NeilBrown <neilb@suse.de>

nfsv2 needs the I_MUTEX_PARENT on the directory when creating a file too.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 fs/nfsd/nfsproc.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/nfsd/nfsproc.c~knfsd-nfsd-lockdep-annotation-fix fs/nfsd/nfsproc.c
--- a/fs/nfsd/nfsproc.c~knfsd-nfsd-lockdep-annotation-fix
+++ a/fs/nfsd/nfsproc.c
@@ -225,7 +225,7 @@ nfsd_proc_create(struct svc_rqst *rqstp,
 	nfserr = nfserr_exist;
 	if (isdotent(argp->name, argp->len))
 		goto done;
-	fh_lock(dirfhp);
+	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
 	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
 	if (IS_ERR(dchild)) {
 		nfserr = nfserrno(PTR_ERR(dchild));
_

Subject: + forcedeth-hardirq-lockdep-warning.patch added to -mm tree
To: mm-commits@vger.kernel.org
Cc: a.p.zijlstra@chello.nl, aabdulla@nvidia.com, arjan@linux.intel.com,
        davej@redhat.com, jeff@garzik.org, mingo@elte.hu
From: akpm@osdl.org
Date: Tue, 19 Sep 2006 11:15:32 -0700

The patch titled

     forcedeth: hardirq lockdep warning

has been added to the -mm tree.  Its filename is

     forcedeth-hardirq-lockdep-warning.patch

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

------------------------------------------------------
Subject: forcedeth: hardirq lockdep warning
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

BUG: warning at kernel/lockdep.c:1816/trace_hardirqs_on() (Not tainted)

Call Trace:
 show_trace
 dump_stack
 trace_hardirqs_on
 :forcedeth:nv_nic_irq_other
 handle_IRQ_event
 __do_IRQ
 do_IRQ
 ret_from_intr
DWARF2 barf
 default_idle
 cpu_idle
 rest_init
 start_kernel
 _sinittext

These 3 functions nv_nic_irq_tx(), nv_nic_irq_rx() and nv_nic_irq_other()
are reachable from IRQ context and process context. Make use of the
irq-save/restore spinlock variant.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/forcedeth.c |   31 +++++++++++++++++--------------
 1 files changed, 17 insertions(+), 14 deletions(-)

diff -puN drivers/net/forcedeth.c~forcedeth-hardirq-lockdep-warning drivers/net/forcedeth.c
--- a/drivers/net/forcedeth.c~forcedeth-hardirq-lockdep-warning
+++ a/drivers/net/forcedeth.c
@@ -2497,6 +2497,7 @@ static irqreturn_t nv_nic_irq_tx(int foo
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
 	int i;
+	unsigned long flags;
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq_tx\n", dev->name);
 
@@ -2508,16 +2509,16 @@ static irqreturn_t nv_nic_irq_tx(int foo
 		if (!(events & np->irqmask))
 			break;
 
-		spin_lock_irq(&np->lock);
+		spin_lock_irqsave(&np->lock, flags);
 		nv_tx_done(dev);
-		spin_unlock_irq(&np->lock);
+		spin_unlock_irqrestore(&np->lock, flags);
 
 		if (events & (NVREG_IRQ_TX_ERR)) {
 			dprintk(KERN_DEBUG "%s: received irq with events 0x%x. Probably TX fail.\n",
 						dev->name, events);
 		}
 		if (i > max_interrupt_work) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			/* disable interrupts on the nic */
 			writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask);
 			pci_push(base);
@@ -2527,7 +2528,7 @@ static irqreturn_t nv_nic_irq_tx(int foo
 				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
 			}
 			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_tx.\n", dev->name, i);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 			break;
 		}
 
@@ -2601,6 +2602,7 @@ static irqreturn_t nv_nic_irq_rx(int foo
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
 	int i;
+	unsigned long flags;
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq_rx\n", dev->name);
 
@@ -2614,14 +2616,14 @@ static irqreturn_t nv_nic_irq_rx(int foo
 
 		nv_rx_process(dev, dev->weight);
 		if (nv_alloc_rx(dev)) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			if (!np->in_shutdown)
 				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 		}
 
 		if (i > max_interrupt_work) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			/* disable interrupts on the nic */
 			writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask);
 			pci_push(base);
@@ -2631,7 +2633,7 @@ static irqreturn_t nv_nic_irq_rx(int foo
 				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
 			}
 			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_rx.\n", dev->name, i);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 			break;
 		}
 	}
@@ -2648,6 +2650,7 @@ static irqreturn_t nv_nic_irq_other(int 
 	u8 __iomem *base = get_hwbase(dev);
 	u32 events;
 	int i;
+	unsigned long flags;
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq_other\n", dev->name);
 
@@ -2660,14 +2663,14 @@ static irqreturn_t nv_nic_irq_other(int 
 			break;
 
 		if (events & NVREG_IRQ_LINK) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			nv_link_irq(dev);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 		}
 		if (np->need_linktimer && time_after(jiffies, np->link_timeout)) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			nv_linkchange(dev);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 			np->link_timeout = jiffies + LINK_TIMEOUT;
 		}
 		if (events & (NVREG_IRQ_UNKNOWN)) {
@@ -2675,7 +2678,7 @@ static irqreturn_t nv_nic_irq_other(int 
 						dev->name, events);
 		}
 		if (i > max_interrupt_work) {
-			spin_lock_irq(&np->lock);
+			spin_lock_irqsave(&np->lock, flags);
 			/* disable interrupts on the nic */
 			writel(NVREG_IRQ_OTHER, base + NvRegIrqMask);
 			pci_push(base);
@@ -2685,7 +2688,7 @@ static irqreturn_t nv_nic_irq_other(int 
 				mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
 			}
 			printk(KERN_DEBUG "%s: too many iterations (%d) in nv_nic_irq_other.\n", dev->name, i);
-			spin_unlock_irq(&np->lock);
+			spin_unlock_irqrestore(&np->lock, flags);
 			break;
 		}
 
_

Patches currently in -mm which might be from a.p.zijlstra@chello.nl are

forcedeth-hardirq-lockdep-warning.patch
mm-tracking-shared-dirty-pages.patch
mm-tracking-shared-dirty-pages-nommu-fix-2.patch
mm-balance-dirty-pages.patch
mm-optimize-the-new-mprotect-code-a-bit.patch
mm-small-cleanup-of-install_page.patch
mm-fixup-do_wp_page.patch
mm-msync-cleanup.patch
mm-tracking-shared-dirty-pages-checks.patch
mm-tracking-shared-dirty-pages-wimp.patch
mm-swap-write-failure-fixup.patch
mm-swap-write-failure-fixup-update.patch
mm-swap-write-failure-fixup-fix.patch
block_devc-mutex_lock_nested-fix.patch
remove-the-old-bd_mutex-lockdep-annotation.patch
new-bd_mutex-lockdep-annotation.patch
nfsd-lockdep-annotation.patch

Date: Wed, 13 Sep 2006 10:56:32 +0200
From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH] Slab fix alien cache lockdep warnings

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203098

This patch was not queued for .18 afaik,

---
From: Ravikiran G Thirumalai <kiran@scalex86.org>

Place the alien array cache locks of on slab malloc slab caches on a
seperate lockdep class.  This avoids false positives from lockdep

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---

 mm/slab.c |   55 ++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 42 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -674,6 +674,8 @@ static struct kmem_cache cache_cache = {
 #endif
 };
 
+#define BAD_ALIEN_MAGIC 0x01020304ul
+
 #ifdef CONFIG_LOCKDEP
 
 /*
@@ -682,29 +684,53 @@ static struct kmem_cache cache_cache = {
  * The locking for this is tricky in that it nests within the locks
  * of all other slabs in a few places; to deal with this special
  * locking we put on-slab caches into a separate lock-class.
+ *
+ * We set lock class for alien array caches which are up during init.
+ * The lock annotation will be lost if all cpus of a node goes down and
+ * then comes back up during hotplug
  */
-static struct lock_class_key on_slab_key;
+static struct lock_class_key on_slab_l3_key;
+static struct lock_class_key on_slab_alc_key;
+
+static inline void init_lock_keys(void)
 
-static inline void init_lock_keys(struct cache_sizes *s)
 {
 	int q;
+	struct cache_sizes *s = malloc_sizes;
 
-	for (q = 0; q < MAX_NUMNODES; q++) {
-		if (!s->cs_cachep->nodelists[q] || OFF_SLAB(s->cs_cachep))
-			continue;
-		lockdep_set_class(&s->cs_cachep->nodelists[q]->list_lock,
-				  &on_slab_key);
+	while (s->cs_size != ULONG_MAX) {
+		for_each_node(q) {
+			struct array_cache **alc;
+			int r;
+			struct kmem_list3 *l3 = s->cs_cachep->nodelists[q];
+			if (!l3 || OFF_SLAB(s->cs_cachep))
+				continue;
+			lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
+			alc = l3->alien;
+			/*
+			 * FIXME: This check for BAD_ALIEN_MAGIC
+			 * should go away when common slab code is taught to
+			 * work even without alien caches.
+			 * Currently, non NUMA code returns BAD_ALIEN_MAGIC
+			 * for alloc_alien_cache,
+			 */
+			if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
+				continue;
+			for_each_node(r) {
+				if (alc[r])
+					lockdep_set_class(&alc[r]->lock,
+					     &on_slab_alc_key);
+			}
+		}
+		s++;
 	}
 }
-
 #else
-static inline void init_lock_keys(struct cache_sizes *s)
+static inline void init_lock_keys(void)
 {
 }
 #endif
 
-
-
 /* Guard access to the cache-chain. */
 static DEFINE_MUTEX(cache_chain_mutex);
 static struct list_head cache_chain;
@@ -1092,7 +1118,7 @@ static inline int cache_free_alien(struc
 
 static inline struct array_cache **alloc_alien_cache(int node, int limit)
 {
-	return (struct array_cache **) 0x01020304ul;
+	return (struct array_cache **)BAD_ALIEN_MAGIC;
 }
 
 static inline void free_alien_cache(struct array_cache **ac_ptr)
@@ -1422,7 +1448,6 @@ void __init kmem_cache_init(void)
 					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 					NULL, NULL);
 		}
-		init_lock_keys(sizes);
 
 		sizes->cs_dmacachep = kmem_cache_create(names->name_dma,
 					sizes->cs_size,
@@ -1495,6 +1520,10 @@ void __init kmem_cache_init(void)
 		mutex_unlock(&cache_chain_mutex);
 	}
 
+	/* Annotate slab for lockdep -- annotate the malloc caches */
+	init_lock_keys();
+
+
 	/* Done! */
 	g_cpucache_up = FULL;
 

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

nodemgr_update_pdrv grabbed an rw semaphore (as reader) which was already
taken by its caller's caller, nodemgr_probe_ne (as reader too).  Reported by
Miles Lane, call path pointed out by Arjan van de Ven.

FIXME:
Shouldn't we rather use class->sem there, not class->subsys.rwsem?

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/ieee1394/nodemgr.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff -puN drivers/ieee1394/nodemgr.c~ieee1394-nodemgr-fix-rwsem-recursion drivers/ieee1394/nodemgr.c
--- a/drivers/ieee1394/nodemgr.c~ieee1394-nodemgr-fix-rwsem-recursion
+++ a/drivers/ieee1394/nodemgr.c
@@ -1316,6 +1316,7 @@ static void nodemgr_node_scan(struct hos
 }
 
 
+/* Caller needs to hold nodemgr_ud_class.subsys.rwsem as reader. */
 static void nodemgr_suspend_ne(struct node_entry *ne)
 {
 	struct class_device *cdev;
@@ -1368,15 +1369,14 @@ static void nodemgr_resume_ne(struct nod
 }
 
 
+/* Caller needs to hold nodemgr_ud_class.subsys.rwsem as reader. */
 static void nodemgr_update_pdrv(struct node_entry *ne)
 {
 	struct unit_directory *ud;
 	struct hpsb_protocol_driver *pdrv;
-	struct class *class = &nodemgr_ud_class;
 	struct class_device *cdev;
 
-	down_read(&class->subsys.rwsem);
-	list_for_each_entry(cdev, &class->children, node) {
+	list_for_each_entry(cdev, &nodemgr_ud_class.children, node) {
 		ud = container_of(cdev, struct unit_directory, class_dev);
 		if (ud->ne != ne || !ud->device.driver)
 			continue;
@@ -1389,7 +1389,6 @@ static void nodemgr_update_pdrv(struct n
 			up_write(&ud->device.bus->subsys.rwsem);
 		}
 	}
-	up_read(&class->subsys.rwsem);
 }
 
 
@@ -1420,6 +1419,8 @@ static void nodemgr_irm_write_bc(struct 
 }
 
 
+/* Caller needs to hold nodemgr_ud_class.subsys.rwsem as reader because the
+ * calls to nodemgr_update_pdrv() and nodemgr_suspend_ne() here require it. */
 static void nodemgr_probe_ne(struct host_info *hi, struct node_entry *ne, int generation)
 {
 	struct device *dev;
_

From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH 1/6] remove the old bd_mutex lockdep annotation
To: rhkernel-list@redhat.com
Date: Wed, 27 Sep 2006 15:33:42 +0200

Remove the old complex and crufty bd_mutex annotation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 block/ioctl.c      |    4 -
 drivers/md/md.c    |    6 -
 fs/block_dev.c     |  180 ++++++++++++++++-------------------------------------
 include/linux/fs.h |   17 -----
 4 files changed, 60 insertions(+), 147 deletions(-)

Index: linux-2.6.18.noarch/drivers/md/md.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/md/md.c
+++ linux-2.6.18.noarch/drivers/md/md.c
@@ -1408,7 +1408,7 @@ static int lock_rdev(mdk_rdev_t *rdev, d
 	struct block_device *bdev;
 	char b[BDEVNAME_SIZE];
 
-	bdev = open_partition_by_devnum(dev, FMODE_READ|FMODE_WRITE);
+	bdev = open_by_devnum(dev, FMODE_READ|FMODE_WRITE);
 	if (IS_ERR(bdev)) {
 		printk(KERN_ERR "md: could not open %s.\n",
 			__bdevname(dev, b));
@@ -1418,7 +1418,7 @@ static int lock_rdev(mdk_rdev_t *rdev, d
 	if (err) {
 		printk(KERN_ERR "md: could not bd_claim %s.\n",
 			bdevname(bdev, b));
-		blkdev_put_partition(bdev);
+		blkdev_put(bdev);
 		return err;
 	}
 	rdev->bdev = bdev;
@@ -1432,7 +1432,7 @@ static void unlock_rdev(mdk_rdev_t *rdev
 	if (!bdev)
 		MD_BUG();
 	bd_release(bdev);
-	blkdev_put_partition(bdev);
+	blkdev_put(bdev);
 }
 
 void md_autodetect_dev(dev_t dev);
Index: linux-2.6.18.noarch/fs/block_dev.c
===================================================================
--- linux-2.6.18.noarch.orig/fs/block_dev.c
+++ linux-2.6.18.noarch/fs/block_dev.c
@@ -739,7 +739,7 @@ static int bd_claim_by_kobject(struct bl
 	if (!bo)
 		return -ENOMEM;
 
-	mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_PARTITION);
+	mutex_lock(&bdev->bd_mutex);
 	res = bd_claim(bdev, holder);
 	if (res || !add_bd_holder(bdev, bo))
 		free_bd_holder(bo);
@@ -764,7 +764,7 @@ static void bd_release_from_kobject(stru
 	if (!kobj)
 		return;
 
-	mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_PARTITION);
+	mutex_lock(&bdev->bd_mutex);
 	bd_release(bdev);
 	if ((bo = del_bd_holder(bdev, kobj)))
 		free_bd_holder(bo);
@@ -822,22 +822,6 @@ struct block_device *open_by_devnum(dev_
 
 EXPORT_SYMBOL(open_by_devnum);
 
-static int
-blkdev_get_partition(struct block_device *bdev, mode_t mode, unsigned flags);
-
-struct block_device *open_partition_by_devnum(dev_t dev, unsigned mode)
-{
-	struct block_device *bdev = bdget(dev);
-	int err = -ENOMEM;
-	int flags = mode & FMODE_WRITE ? O_RDWR : O_RDONLY;
-	if (bdev)
-		err = blkdev_get_partition(bdev, mode, flags);
-	return err ? ERR_PTR(err) : bdev;
-}
-
-EXPORT_SYMBOL(open_partition_by_devnum);
-
-
 /*
  * This routine checks whether a removable media has been changed,
  * and invalidates all buffer-cache-entries in that case. This
@@ -884,66 +868,7 @@ void bd_set_size(struct block_device *bd
 }
 EXPORT_SYMBOL(bd_set_size);
 
-static int __blkdev_put(struct block_device *bdev, unsigned int subclass)
-{
-	int ret = 0;
-	struct inode *bd_inode = bdev->bd_inode;
-	struct gendisk *disk = bdev->bd_disk;
-
-	mutex_lock_nested(&bdev->bd_mutex, subclass);
-	lock_kernel();
-	if (!--bdev->bd_openers) {
-		sync_blockdev(bdev);
-		kill_bdev(bdev);
-	}
-	if (bdev->bd_contains == bdev) {
-		if (disk->fops->release)
-			ret = disk->fops->release(bd_inode, NULL);
-	} else {
-		mutex_lock_nested(&bdev->bd_contains->bd_mutex,
-				  subclass + 1);
-		bdev->bd_contains->bd_part_count--;
-		mutex_unlock(&bdev->bd_contains->bd_mutex);
-	}
-	if (!bdev->bd_openers) {
-		struct module *owner = disk->fops->owner;
-
-		put_disk(disk);
-		module_put(owner);
-
-		if (bdev->bd_contains != bdev) {
-			kobject_put(&bdev->bd_part->kobj);
-			bdev->bd_part = NULL;
-		}
-		bdev->bd_disk = NULL;
-		bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
-		if (bdev != bdev->bd_contains)
-			__blkdev_put(bdev->bd_contains, subclass + 1);
-		bdev->bd_contains = NULL;
-	}
-	unlock_kernel();
-	mutex_unlock(&bdev->bd_mutex);
-	bdput(bdev);
-	return ret;
-}
-
-int blkdev_put(struct block_device *bdev)
-{
-	return __blkdev_put(bdev, BD_MUTEX_NORMAL);
-}
-EXPORT_SYMBOL(blkdev_put);
-
-int blkdev_put_partition(struct block_device *bdev)
-{
-	return __blkdev_put(bdev, BD_MUTEX_PARTITION);
-}
-EXPORT_SYMBOL(blkdev_put_partition);
-
-static int
-blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);
-
-static int
-do_open(struct block_device *bdev, struct file *file, unsigned int subclass)
+static int do_open(struct block_device *bdev, struct file *file)
 {
 	struct module *owner = NULL;
 	struct gendisk *disk;
@@ -960,8 +885,7 @@ do_open(struct block_device *bdev, struc
 	}
 	owner = disk->fops->owner;
 
-	mutex_lock_nested(&bdev->bd_mutex, subclass);
-
+	mutex_lock(&bdev->bd_mutex);
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
 		bdev->bd_contains = bdev;
@@ -988,11 +912,11 @@ do_open(struct block_device *bdev, struc
 			ret = -ENOMEM;
 			if (!whole)
 				goto out_first;
-			ret = blkdev_get_whole(whole, file->f_mode, file->f_flags);
+			ret = blkdev_get(whole, file->f_mode, file->f_flags);
 			if (ret)
 				goto out_first;
 			bdev->bd_contains = whole;
-			mutex_lock_nested(&whole->bd_mutex, BD_MUTEX_WHOLE);
+			mutex_lock(&whole->bd_mutex);
 			whole->bd_part_count++;
 			p = disk->part[part - 1];
 			bdev->bd_inode->i_data.backing_dev_info =
@@ -1020,8 +944,7 @@ do_open(struct block_device *bdev, struc
 			if (bdev->bd_invalidated)
 				rescan_partitions(bdev->bd_disk, bdev);
 		} else {
-			mutex_lock_nested(&bdev->bd_contains->bd_mutex,
-					  BD_MUTEX_PARTITION);
+			mutex_lock(&bdev->bd_contains->bd_mutex);
 			bdev->bd_contains->bd_part_count++;
 			mutex_unlock(&bdev->bd_contains->bd_mutex);
 		}
@@ -1035,7 +958,7 @@ out_first:
 	bdev->bd_disk = NULL;
 	bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
 	if (bdev != bdev->bd_contains)
-		__blkdev_put(bdev->bd_contains, BD_MUTEX_WHOLE);
+		blkdev_put(bdev->bd_contains);
 	bdev->bd_contains = NULL;
 	put_disk(disk);
 	module_put(owner);
@@ -1062,49 +985,11 @@ int blkdev_get(struct block_device *bdev
 	fake_file.f_dentry = &fake_dentry;
 	fake_dentry.d_inode = bdev->bd_inode;
 
-	return do_open(bdev, &fake_file, BD_MUTEX_NORMAL);
+	return do_open(bdev, &fake_file);
 }
 
 EXPORT_SYMBOL(blkdev_get);
 
-static int
-blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags)
-{
-	/*
-	 * This crockload is due to bad choice of ->open() type.
-	 * It will go away.
-	 * For now, block device ->open() routine must _not_
-	 * examine anything in 'inode' argument except ->i_rdev.
-	 */
-	struct file fake_file = {};
-	struct dentry fake_dentry = {};
-	fake_file.f_mode = mode;
-	fake_file.f_flags = flags;
-	fake_file.f_dentry = &fake_dentry;
-	fake_dentry.d_inode = bdev->bd_inode;
-
-	return do_open(bdev, &fake_file, BD_MUTEX_WHOLE);
-}
-
-static int
-blkdev_get_partition(struct block_device *bdev, mode_t mode, unsigned flags)
-{
-	/*
-	 * This crockload is due to bad choice of ->open() type.
-	 * It will go away.
-	 * For now, block device ->open() routine must _not_
-	 * examine anything in 'inode' argument except ->i_rdev.
-	 */
-	struct file fake_file = {};
-	struct dentry fake_dentry = {};
-	fake_file.f_mode = mode;
-	fake_file.f_flags = flags;
-	fake_file.f_dentry = &fake_dentry;
-	fake_dentry.d_inode = bdev->bd_inode;
-
-	return do_open(bdev, &fake_file, BD_MUTEX_PARTITION);
-}
-
 static int blkdev_open(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev;
@@ -1120,7 +1005,7 @@ static int blkdev_open(struct inode * in
 
 	bdev = bd_acquire(inode);
 
-	res = do_open(bdev, filp, BD_MUTEX_NORMAL);
+	res = do_open(bdev, filp);
 	if (res)
 		return res;
 
@@ -1134,6 +1019,51 @@ static int blkdev_open(struct inode * in
 	return res;
 }
 
+int blkdev_put(struct block_device *bdev)
+{
+	int ret = 0;
+	struct inode *bd_inode = bdev->bd_inode;
+	struct gendisk *disk = bdev->bd_disk;
+
+	mutex_lock(&bdev->bd_mutex);
+	lock_kernel();
+	if (!--bdev->bd_openers) {
+		sync_blockdev(bdev);
+		kill_bdev(bdev);
+	}
+	if (bdev->bd_contains == bdev) {
+		if (disk->fops->release)
+			ret = disk->fops->release(bd_inode, NULL);
+	} else {
+		mutex_lock(&bdev->bd_contains->bd_mutex);
+		bdev->bd_contains->bd_part_count--;
+		mutex_unlock(&bdev->bd_contains->bd_mutex);
+	}
+	if (!bdev->bd_openers) {
+		struct module *owner = disk->fops->owner;
+
+		put_disk(disk);
+		module_put(owner);
+
+		if (bdev->bd_contains != bdev) {
+			kobject_put(&bdev->bd_part->kobj);
+			bdev->bd_part = NULL;
+		}
+		bdev->bd_disk = NULL;
+		bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
+		if (bdev != bdev->bd_contains) {
+			blkdev_put(bdev->bd_contains);
+		}
+		bdev->bd_contains = NULL;
+	}
+	unlock_kernel();
+	mutex_unlock(&bdev->bd_mutex);
+	bdput(bdev);
+	return ret;
+}
+
+EXPORT_SYMBOL(blkdev_put);
+
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
 	struct block_device *bdev = I_BDEV(filp->f_mapping->host);
Index: linux-2.6.18.noarch/include/linux/fs.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/fs.h
+++ linux-2.6.18.noarch/include/linux/fs.h
@@ -440,21 +440,6 @@ struct block_device {
 };
 
 /*
- * bdev->bd_mutex nesting subclasses for the lock validator:
- *
- * 0: normal
- * 1: 'whole'
- * 2: 'partition'
- */
-enum bdev_bd_mutex_lock_class
-{
-	BD_MUTEX_NORMAL,
-	BD_MUTEX_WHOLE,
-	BD_MUTEX_PARTITION
-};
-
-
-/*
  * Radix-tree tags, for tagging dirty and writeback pages within the pagecache
  * radix trees
  */
@@ -1447,7 +1432,6 @@ extern void bd_set_size(struct block_dev
 extern void bd_forget(struct inode *inode);
 extern void bdput(struct block_device *);
 extern struct block_device *open_by_devnum(dev_t, unsigned);
-extern struct block_device *open_partition_by_devnum(dev_t, unsigned);
 extern const struct file_operations def_blk_fops;
 extern const struct address_space_operations def_blk_aops;
 extern const struct file_operations def_chr_fops;
@@ -1458,7 +1442,6 @@ extern int blkdev_ioctl(struct inode *, 
 extern long compat_blkdev_ioctl(struct file *, unsigned, unsigned long);
 extern int blkdev_get(struct block_device *, mode_t, unsigned);
 extern int blkdev_put(struct block_device *);
-extern int blkdev_put_partition(struct block_device *);
 extern int bd_claim(struct block_device *, void *);
 extern void bd_release(struct block_device *);
 #ifdef CONFIG_SYSFS
Index: linux-2.6.18.noarch/block/ioctl.c
===================================================================
--- linux-2.6.18.noarch.orig/block/ioctl.c
+++ linux-2.6.18.noarch/block/ioctl.c
@@ -72,7 +72,7 @@ static int blkpg_ioctl(struct block_devi
 			bdevp = bdget_disk(disk, part);
 			if (!bdevp)
 				return -ENOMEM;
-			mutex_lock_nested(&bdevp->bd_mutex, BD_MUTEX_PARTITION);
+			mutex_lock(&bdevp->bd_mutex);
 			if (bdevp->bd_openers) {
 				mutex_unlock(&bdevp->bd_mutex);
 				bdput(bdevp);
@@ -82,7 +82,7 @@ static int blkpg_ioctl(struct block_devi
 			fsync_bdev(bdevp);
 			invalidate_bdev(bdevp, 0);
 
-			mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_WHOLE);
+			mutex_lock(&bdev->bd_mutex);
 			delete_partition(disk, part);
 			mutex_unlock(&bdev->bd_mutex);
 			mutex_unlock(&bdevp->bd_mutex);

--

From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH 3/6] usb-serial: irq lock inversion (PPP vs. usb-serial)
To: rhkernel-list@redhat.com
Date: Wed, 27 Sep 2006 15:33:44 +0200

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
ksoftirqd/0/3 just changed the state of lock:
 (&ap->xmit_lock){-+..}, at: [<f9337224>] ppp_async_push+0x2f/0x3b3 [ppp_async]
but this lock took another, soft-irq-unsafe lock in the past:
 (&port->lock){--..}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
no locks held by ksoftirqd/0/3.

the first lock's dependencies:
-> (&ap->xmit_lock){-+..} ops: 0 {
   initial-use  at:
                        [<c043bf43>] lock_acquire+0x4b/0x6c
                        [<c06086a8>] _spin_lock_bh+0x1e/0x2d
                        [<f9337224>] ppp_async_push+0x2f/0x3b3 [ppp_async]
                        [<f93375b8>] ppp_async_send+0x10/0x3d [ppp_async]
                        [<f932f071>] ppp_channel_push+0x3a/0x94 [ppp_generic]
                        [<f9330395>] ppp_write+0xd5/0xe1 [ppp_generic]
                        [<c0471f23>] vfs_write+0xab/0x157
                        [<c0472568>] sys_write+0x3b/0x60
                        [<c0403faf>] syscall_call+0x7/0xb
   in-softirq-W at:
                        [<c043bf43>] lock_acquire+0x4b/0x6c
                        [<c06086a8>] _spin_lock_bh+0x1e/0x2d
                        [<f9337224>] ppp_async_push+0x2f/0x3b3 [ppp_async]
                        [<f9337aea>] ppp_async_process+0x48/0x5b [ppp_async]
                        [<c04294b4>] tasklet_action+0x65/0xca
                        [<c04293d5>] __do_softirq+0x78/0xf2
                        [<c040662f>] do_softirq+0x5a/0xbe
   hardirq-on-W at:
                        [<c043bf43>] lock_acquire+0x4b/0x6c
                        [<c06086a8>] _spin_lock_bh+0x1e/0x2d
                        [<f9337224>] ppp_async_push+0x2f/0x3b3 [ppp_async]
                        [<f93375b8>] ppp_async_send+0x10/0x3d [ppp_async]
                        [<f932f071>] ppp_channel_push+0x3a/0x94 [ppp_generic]
                        [<f9330395>] ppp_write+0xd5/0xe1 [ppp_generic]
                        [<c0471f23>] vfs_write+0xab/0x157
                        [<c0472568>] sys_write+0x3b/0x60
                        [<c0403faf>] syscall_call+0x7/0xb
 }
 ... key      at: [<f933b208>] __key.19284+0x0/0xffffce72 [ppp_async]
 -> (&port->lock){--..} ops: 0 {
    initial-use  at:
                          [<c043bf43>] lock_acquire+0x4b/0x6c
                          [<c060867b>] _spin_lock+0x19/0x28
                          [<f9324478>] usb_serial_generic_write+0x79/0x23d [usbserial]
                          [<f9322531>] serial_write+0x8a/0x99 [usbserial]
                          [<c052dbed>] write_chan+0x22e/0x2a8
                          [<c052b530>] tty_write+0x148/0x1ce
                          [<c0471f23>] vfs_write+0xab/0x157
                          [<c0472568>] sys_write+0x3b/0x60
                          [<c0403faf>] syscall_call+0x7/0xb
    softirq-on-W at:
                          [<c043bf43>] lock_acquire+0x4b/0x6c
                          [<c060867b>] _spin_lock+0x19/0x28
                          [<f9324478>] usb_serial_generic_write+0x79/0x23d [usbserial]
                          [<f9322531>] serial_write+0x8a/0x99 [usbserial]
                          [<c052dbed>] write_chan+0x22e/0x2a8
                          [<c052b530>] tty_write+0x148/0x1ce
                          [<c0471f23>] vfs_write+0xab/0x157
                          [<c0472568>] sys_write+0x3b/0x60
                          [<c0403faf>] syscall_call+0x7/0xb
    hardirq-on-W at:
                          [<c043bf43>] lock_acquire+0x4b/0x6c
                          [<c060867b>] _spin_lock+0x19/0x28
                          [<f9324478>] usb_serial_generic_write+0x79/0x23d [usbserial]
                          [<f9322531>] serial_write+0x8a/0x99 [usbserial]
                          [<c052dbed>] write_chan+0x22e/0x2a8
                          [<c052b530>] tty_write+0x148/0x1ce
                          [<c0471f23>] vfs_write+0xab/0x157
                          [<c0472568>] sys_write+0x3b/0x60
                          [<c0403faf>] syscall_call+0x7/0xb
  }
  ... key      at: [<f932b08c>] __key.15523+0x0/0xffff9965 [usbserial]
 ... acquired at:
   [<c043bf43>] lock_acquire+0x4b/0x6c
   [<c060867b>] _spin_lock+0x19/0x28
   [<f9324478>] usb_serial_generic_write+0x79/0x23d [usbserial]
   [<f9322531>] serial_write+0x8a/0x99 [usbserial]
   [<f933729c>] ppp_async_push+0xa7/0x3b3 [ppp_async]
   [<f93375da>] ppp_async_send+0x32/0x3d [ppp_async]
   [<f932f071>] ppp_channel_push+0x3a/0x94 [ppp_generic]
   [<f9330395>] ppp_write+0xd5/0xe1 [ppp_generic]
   [<c0471f23>] vfs_write+0xab/0x157
   [<c0472568>] sys_write+0x3b/0x60
   [<c0403faf>] syscall_call+0x7/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Greg KH <greg@kroah.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 drivers/usb/serial/cyberjack.c   |    6 +++---
 drivers/usb/serial/generic.c     |    6 +++---
 drivers/usb/serial/ipw.c         |    6 +++---
 drivers/usb/serial/ir-usb.c      |    6 +++---
 drivers/usb/serial/keyspan_pda.c |    6 +++---
 drivers/usb/serial/omninet.c     |    6 +++---
 drivers/usb/serial/safe_serial.c |    6 +++---
 7 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-2.6.18.noarch/drivers/usb/serial/cyberjack.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/cyberjack.c
+++ linux-2.6.18.noarch/drivers/usb/serial/cyberjack.c
@@ -214,14 +214,14 @@ static int cyberjack_write (struct usb_s
 		return (0);
 	}
 
-	spin_lock(&port->lock);
+	spin_lock_bh(&port->lock);
 	if (port->write_urb_busy) {
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 		dbg("%s - already writing", __FUNCTION__);
 		return 0;
 	}
 	port->write_urb_busy = 1;
-	spin_unlock(&port->lock);
+	spin_unlock_bh(&port->lock);
 
 	spin_lock_irqsave(&priv->lock, flags);
 
Index: linux-2.6.18.noarch/drivers/usb/serial/generic.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/generic.c
+++ linux-2.6.18.noarch/drivers/usb/serial/generic.c
@@ -175,14 +175,14 @@ int usb_serial_generic_write(struct usb_
 
 	/* only do something if we have a bulk out endpoint */
 	if (serial->num_bulk_out) {
-		spin_lock(&port->lock);
+		spin_lock_bh(&port->lock);
 		if (port->write_urb_busy) {
-			spin_unlock(&port->lock);
+			spin_unlock_bh(&port->lock);
 			dbg("%s - already writing", __FUNCTION__);
 			return 0;
 		}
 		port->write_urb_busy = 1;
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 
 		count = (count > port->bulk_out_size) ? port->bulk_out_size : count;
 
Index: linux-2.6.18.noarch/drivers/usb/serial/ipw.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/ipw.c
+++ linux-2.6.18.noarch/drivers/usb/serial/ipw.c
@@ -394,14 +394,14 @@ static int ipw_write(struct usb_serial_p
 		return 0;
 	}
 
-	spin_lock(&port->lock);
+	spin_lock_bh(&port->lock);
 	if (port->write_urb_busy) {
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 		dbg("%s - already writing", __FUNCTION__);
 		return 0;
 	}
 	port->write_urb_busy = 1;
-	spin_unlock(&port->lock);
+	spin_unlock_bh(&port->lock);
 
 	count = min(count, port->bulk_out_size);
 	memcpy(port->bulk_out_buffer, buf, count);
Index: linux-2.6.18.noarch/drivers/usb/serial/ir-usb.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/ir-usb.c
+++ linux-2.6.18.noarch/drivers/usb/serial/ir-usb.c
@@ -342,14 +342,14 @@ static int ir_write (struct usb_serial_p
 	if (count == 0)
 		return 0;
 
-	spin_lock(&port->lock);
+	spin_lock_bh(&port->lock);
 	if (port->write_urb_busy) {
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 		dbg("%s - already writing", __FUNCTION__);
 		return 0;
 	}
 	port->write_urb_busy = 1;
-	spin_unlock(&port->lock);
+	spin_unlock_bh(&port->lock);
 
 	transfer_buffer = port->write_urb->transfer_buffer;
 	transfer_size = min(count, port->bulk_out_size - 1);
Index: linux-2.6.18.noarch/drivers/usb/serial/keyspan_pda.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/keyspan_pda.c
+++ linux-2.6.18.noarch/drivers/usb/serial/keyspan_pda.c
@@ -518,13 +518,13 @@ static int keyspan_pda_write(struct usb_
 	   the TX urb is in-flight (wait until it completes)
 	   the device is full (wait until it says there is room)
 	*/
-	spin_lock(&port->lock);
+	spin_lock_bh(&port->lock);
 	if (port->write_urb_busy || priv->tx_throttled) {
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 		return 0;
 	}
 	port->write_urb_busy = 1;
-	spin_unlock(&port->lock);
+	spin_unlock_bh(&port->lock);
 
 	/* At this point the URB is in our control, nobody else can submit it
 	   again (the only sudden transition was the one from EINPROGRESS to
Index: linux-2.6.18.noarch/drivers/usb/serial/omninet.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/omninet.c
+++ linux-2.6.18.noarch/drivers/usb/serial/omninet.c
@@ -256,14 +256,14 @@ static int omninet_write (struct usb_ser
 		return (0);
 	}
 
-	spin_lock(&wport->lock);
+	spin_lock_bh(&wport->lock);
 	if (wport->write_urb_busy) {
-		spin_unlock(&wport->lock);
+		spin_unlock_bh(&wport->lock);
 		dbg("%s - already writing", __FUNCTION__);
 		return 0;
 	}
 	wport->write_urb_busy = 1;
-	spin_unlock(&wport->lock);
+	spin_unlock_bh(&wport->lock);
 
 	count = (count > OMNINET_BULKOUTSIZE) ? OMNINET_BULKOUTSIZE : count;
 
Index: linux-2.6.18.noarch/drivers/usb/serial/safe_serial.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/usb/serial/safe_serial.c
+++ linux-2.6.18.noarch/drivers/usb/serial/safe_serial.c
@@ -298,14 +298,14 @@ static int safe_write (struct usb_serial
 		dbg ("%s - write request of 0 bytes", __FUNCTION__);
 		return (0);
 	}
-	spin_lock(&port->lock);
+	spin_lock_bh(&port->lock);
 	if (port->write_urb_busy) {
-		spin_unlock(&port->lock);
+		spin_unlock_bh(&port->lock);
 		dbg("%s - already writing", __FUNCTION__);
 		return 0;
 	}
 	port->write_urb_busy = 1;
-	spin_unlock(&port->lock);
+	spin_unlock_bh(&port->lock);
 
 	packet_length = port->bulk_out_size;	// get max packetsize
 

--

From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH 4/6] lockdep: lockdep_set_class_and_subclass
To: rhkernel-list@redhat.com
Date: Wed, 27 Sep 2006 15:33:45 +0200

Add lockdep_set_class_and_subclass() to the lockdep annotations.

This annotation makes it possible to assign a subclass on lock init. This
annotation is meant to reduce the _nested() annotations by assigning a
default subclass.

One could do without this annotation and rely on lockdep_set_class()
exclusively, but that would require a manual stack of struct lock_class_key
objects.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/lockdep.h |   12 ++++++++----
 kernel/lockdep.c        |   10 ++++++----
 kernel/mutex-debug.c    |    2 +-
 lib/rwsem-spinlock.c    |    2 +-
 lib/rwsem.c             |    2 +-
 lib/spinlock_debug.c    |    4 ++--
 net/core/sock.c         |    2 +-
 7 files changed, 20 insertions(+), 14 deletions(-)

Index: linux-2.6.18.noarch/include/linux/lockdep.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/lockdep.h
+++ linux-2.6.18.noarch/include/linux/lockdep.h
@@ -202,7 +202,7 @@ extern int lockdep_internal(void);
  */
 
 extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key);
+			     struct lock_class_key *key, int subclass);
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -211,9 +211,11 @@ extern void lockdep_init_map(struct lock
  * or they are too narrow (they suffer from a false class-split):
  */
 #define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key)
+		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
 #define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key)
+		lockdep_init_map(&(lock)->dep_map, name, key, 0)
+#define lockdep_set_class_and_subclass(lock, key, sub) \
+		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
 
 /*
  * Acquire a lock.
@@ -257,10 +259,12 @@ static inline int lockdep_internal(void)
 # define lock_release(l, n, i)			do { } while (0)
 # define lockdep_init()				do { } while (0)
 # define lockdep_info()				do { } while (0)
-# define lockdep_init_map(lock, name, key)	do { (void)(key); } while (0)
+# define lockdep_init_map(lock, name, key, sub)	do { (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
 # define lockdep_set_class_and_name(lock, key, name) \
 		do { (void)(key); } while (0)
+#define lockdep_set_class_and_subclass(lock, key, sub) \
+		do { (void)(key); } while (0)
 # define INIT_LOCKDEP
 # define lockdep_reset()		do { debug_locks = 1; } while (0)
 # define lockdep_free_key_range(start, size)	do { } while (0)
Index: linux-2.6.18.noarch/kernel/lockdep.c
===================================================================
--- linux-2.6.18.noarch.orig/kernel/lockdep.c
+++ linux-2.6.18.noarch/kernel/lockdep.c
@@ -1170,7 +1170,7 @@ look_up_lock_class(struct lockdep_map *l
  * itself, so actual lookup of the hash should be once per lock object.
  */
 static inline struct lock_class *
-register_lock_class(struct lockdep_map *lock, unsigned int subclass)
+register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
 {
 	struct lockdep_subclass_key *key;
 	struct list_head *hash_head;
@@ -1242,7 +1242,7 @@ register_lock_class(struct lockdep_map *
 out_unlock_set:
 	__raw_spin_unlock(&hash_lock);
 
-	if (!subclass)
+	if (!subclass || force)
 		lock->class_cache = class;
 
 	DEBUG_LOCKS_WARN_ON(class->subclass != subclass);
@@ -1930,7 +1930,7 @@ void trace_softirqs_off(unsigned long ip
  * Initialize a lock instance's lock-class mapping info:
  */
 void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key)
+		      struct lock_class_key *key, int subclass)
 {
 	if (unlikely(!debug_locks))
 		return;
@@ -1950,6 +1950,8 @@ void lockdep_init_map(struct lockdep_map
 	lock->name = name;
 	lock->key = key;
 	lock->class_cache = NULL;
+	if (subclass)
+		register_lock_class(lock, subclass, 1);
 }
 
 EXPORT_SYMBOL_GPL(lockdep_init_map);
@@ -1988,7 +1990,7 @@ static int __lock_acquire(struct lockdep
 	 * Not cached yet or subclass?
 	 */
 	if (unlikely(!class)) {
-		class = register_lock_class(lock, subclass);
+		class = register_lock_class(lock, subclass, 0);
 		if (!class)
 			return 0;
 	}
Index: linux-2.6.18.noarch/kernel/mutex-debug.c
===================================================================
--- linux-2.6.18.noarch.orig/kernel/mutex-debug.c
+++ linux-2.6.18.noarch/kernel/mutex-debug.c
@@ -91,7 +91,7 @@ void debug_mutex_init(struct mutex *lock
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key);
+	lockdep_init_map(&lock->dep_map, name, key, 0);
 #endif
 	lock->owner = NULL;
 	lock->magic = lock;
Index: linux-2.6.18.noarch/lib/rwsem-spinlock.c
===================================================================
--- linux-2.6.18.noarch.orig/lib/rwsem-spinlock.c
+++ linux-2.6.18.noarch/lib/rwsem-spinlock.c
@@ -28,7 +28,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key);
+	lockdep_init_map(&sem->dep_map, name, key, 0);
 #endif
 	sem->activity = 0;
 	spin_lock_init(&sem->wait_lock);
Index: linux-2.6.18.noarch/lib/rwsem.c
===================================================================
--- linux-2.6.18.noarch.orig/lib/rwsem.c
+++ linux-2.6.18.noarch/lib/rwsem.c
@@ -19,7 +19,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key);
+	lockdep_init_map(&sem->dep_map, name, key, 0);
 #endif
 	sem->count = RWSEM_UNLOCKED_VALUE;
 	spin_lock_init(&sem->wait_lock);
Index: linux-2.6.18.noarch/lib/spinlock_debug.c
===================================================================
--- linux-2.6.18.noarch.orig/lib/spinlock_debug.c
+++ linux-2.6.18.noarch/lib/spinlock_debug.c
@@ -20,7 +20,7 @@ void __spin_lock_init(spinlock_t *lock, 
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key);
+	lockdep_init_map(&lock->dep_map, name, key, 0);
 #endif
 	lock->raw_lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -38,7 +38,7 @@ void __rwlock_init(rwlock_t *lock, const
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key);
+	lockdep_init_map(&lock->dep_map, name, key, 0);
 #endif
 	lock->raw_lock = (raw_rwlock_t) __RAW_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
Index: linux-2.6.18.noarch/net/core/sock.c
===================================================================
--- linux-2.6.18.noarch.orig/net/core/sock.c
+++ linux-2.6.18.noarch/net/core/sock.c
@@ -827,7 +827,7 @@ static void inline sock_lock_init(struct
 				   af_family_slock_key_strings[sk->sk_family]);
 	lockdep_init_map(&sk->sk_lock.dep_map,
 			 af_family_key_strings[sk->sk_family],
-			 af_family_keys + sk->sk_family);
+			 af_family_keys + sk->sk_family, 0);
 }
 
 /**

--

From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH 5/6] serio: lockdep annotation for ps2dev->cmd_mutex and serio->lock
To: rhkernel-list@redhat.com
Date: Wed, 27 Sep 2006 15:33:46 +0200

Based ideas from Jiri Kosina, this patch tracks the nesting depth
and uses the new lockdep_set_class_and_subclass() annotation to store
this information in the lock objects.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/input/serio/libps2.c |    4 ++++
 drivers/input/serio/serio.c  |    9 ++++++++-
 include/linux/serio.h        |    1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

Index: linux-2.6.18.noarch/drivers/input/serio/libps2.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/input/serio/libps2.c
+++ linux-2.6.18.noarch/drivers/input/serio/libps2.c
@@ -280,6 +280,8 @@ int ps2_schedule_command(struct ps2dev *
 	return 0;
 }
 
+static struct lock_class_key ps2_mutex_key;
+
 /*
  * ps2_init() initializes ps2dev structure
  */
@@ -287,6 +289,8 @@ int ps2_schedule_command(struct ps2dev *
 void __ps2_init(struct ps2dev *ps2dev, struct serio *serio)
 {
 	mutex_init(&ps2dev->cmd_mutex);
+	lockdep_set_class_and_subclass(&ps2dev->cmd_mutex, &ps2_mutex_key,
+				       serio->depth);
 	init_waitqueue_head(&ps2dev->wait);
 	ps2dev->serio = serio;
 }
Index: linux-2.6.18.noarch/drivers/input/serio/serio.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/input/serio/serio.c
+++ linux-2.6.18.noarch/drivers/input/serio/serio.c
@@ -521,6 +521,8 @@ static void serio_release_port(struct de
 	module_put(THIS_MODULE);
 }
 
+static struct lock_class_key serio_lock_key;
+
 /*
  * Prepare serio port for registration.
  */
@@ -538,8 +540,13 @@ static void serio_init_port(struct serio
 		 "serio%ld", (long)atomic_inc_return(&serio_no) - 1);
 	serio->dev.bus = &serio_bus;
 	serio->dev.release = serio_release_port;
-	if (serio->parent)
+	if (serio->parent) {
 		serio->dev.parent = &serio->parent->dev;
+		serio->depth = serio->parent->depth + 1;
+	} else
+		serio->depth = 0;
+	lockdep_set_class_and_subclass(&serio->lock, &serio_lock_key,
+				       serio->depth);
 }
 
 /*
Index: linux-2.6.18.noarch/include/linux/serio.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/serio.h
+++ linux-2.6.18.noarch/include/linux/serio.h
@@ -41,6 +41,7 @@ struct serio {
 	void (*stop)(struct serio *);
 
 	struct serio *parent, *child;
+	unsigned int depth;		/* level of nesting in serio hierarchy */
 
 	struct serio_driver *drv;	/* accessed from interrupt, must be protected by serio->lock and serio->sem */
 	struct mutex drv_mutex;		/* protects serio->drv so attributes can pin driver */

--

From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH 6/6] sysrq: disable lockdep on reboot
To: rhkernel-list@redhat.com
Date: Wed, 27 Sep 2006 15:33:47 +0200

SysRq : Emergency Sync
Emergency Sync complete
SysRq : Emergency Remount R/O
Emergency Remount complete
SysRq : Resetting
BUG: warning at kernel/lockdep.c:1816/trace_hardirqs_on() (Not tainted)

Call Trace:
 [<ffffffff8026d56d>] show_trace+0xae/0x319
 [<ffffffff8026d7ed>] dump_stack+0x15/0x17
 [<ffffffff802a68d1>] trace_hardirqs_on+0xbc/0x13d
 [<ffffffff803a8eec>] sysrq_handle_reboot+0x9/0x11
 [<ffffffff803a8f8d>] __handle_sysrq+0x99/0x130
 [<ffffffff803a903b>] handle_sysrq+0x17/0x19
 [<ffffffff803a36ee>] kbd_event+0x32e/0x57d
 [<ffffffff80401e35>] input_event+0x42d/0x45b
 [<ffffffff804063eb>] atkbd_interrupt+0x44d/0x53d
 [<ffffffff803fe5c5>] serio_interrupt+0x49/0x86
 [<ffffffff803ff2a4>] i8042_interrupt+0x202/0x21a
 [<ffffffff80210cf0>] handle_IRQ_event+0x2c/0x64
 [<ffffffff802bfd8b>] __do_IRQ+0xaf/0x114
 [<ffffffff8026ea24>] do_IRQ+0xf8/0x107
 [<ffffffff8025f886>] ret_from_intr+0x0/0xf
DWARF2 unwinder stuck at ret_from_intr+0x0/0xf
Leftover inexact backtrace:
 <IRQ> <EOI> [<ffffffff80258e36>] mwait_idle+0x3f/0x54
 [<ffffffff8024a33a>] cpu_idle+0xa2/0xc5
 [<ffffffff8026c34e>] rest_init+0x2b/0x2d
 [<ffffffff809708bc>] start_kernel+0x24a/0x24c
 [<ffffffff8097028b>] _sinittext+0x28b/0x292

Since we're shutting down anyway, don't bother being smart,
just turn the thing off.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/char/sysrq.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.18.noarch/drivers/char/sysrq.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/char/sysrq.c
+++ linux-2.6.18.noarch/drivers/char/sysrq.c
@@ -115,6 +115,7 @@ static struct sysrq_key_op sysrq_crashdu
 static void sysrq_handle_reboot(int key, struct pt_regs *pt_regs,
 				struct tty_struct *tty)
 {
+	lockdep_off();
 	local_irq_enable();
 	emergency_restart();
 }

--

Subject: [RHEL5 PATCH] lockdep: annotate bonding driver
From: Peter Zijlstra <pzijlstr@redhat.com>
To: rhkernel-list@redhat.com
Cc: Dave Jones <davej@redhat.com>, Don Zickus <dzickus@redhat.com>,
        "John W. Linville" <linville@redhat.com>
Date: Thu, 28 Sep 2006 20:19:03 +0200

BZ204795

=============================================
[ INFO: possible recursive locking detected ]
2.6.17-1.2600.fc6 #1
---------------------------------------------
ifconfig/2411 is trying to acquire lock:
 (&dev->_xmit_lock){-...}, at: [<ffffffff80429b9f>] dev_mc_add+0x45/0x15f

but task is already holding lock:
 (&dev->_xmit_lock){-...}, at: [<ffffffff80429b9f>] dev_mc_add+0x45/0x15f

other info that might help us debug this:
3 locks held by ifconfig/2411:
 #0:  (rtnl_mutex){--..}, at: [<ffffffff802664af>] mutex_lock+0x2a/0x2e
 #1:  (&dev->_xmit_lock){-...}, at: [<ffffffff80429b9f>] dev_mc_add+0x45/0x15f
 #2:  (&bond->lock){-.-+}, at: [<ffffffff8831b7f7>] bond_set_multicast_list+0x2c/0x26a [bonding]

stack backtrace:

Call Trace:
 [<ffffffff8026e97d>] show_trace+0xae/0x319
 [<ffffffff8026ebfd>] dump_stack+0x15/0x17
 [<ffffffff802a839b>] __lock_acquire+0x135/0xa64
 [<ffffffff802a926d>] lock_acquire+0x4b/0x69
 [<ffffffff80267981>] _spin_lock_bh+0x2a/0x36
 [<ffffffff80429b9f>] dev_mc_add+0x45/0x15f
 [<ffffffff8831b903>] :bonding:bond_set_multicast_list+0x138/0x26a
 [<ffffffff80429901>] __dev_mc_upload+0x22/0x24
 [<ffffffff80429c74>] dev_mc_add+0x11a/0x15f
 [<ffffffff8045d154>] igmp_group_added+0x55/0x10f
 [<ffffffff8045d4ab>] ip_mc_inc_group+0x1d6/0x21a
 [<ffffffff8045d535>] ip_mc_up+0x46/0x61
 [<ffffffff804594b8>] inetdev_init+0x11c/0x136
 [<ffffffff8045a0b7>] devinet_ioctl+0x3eb/0x5e9
 [<ffffffff8045a56c>] inet_ioctl+0x71/0x8f
 [<ffffffff8041ed74>] sock_ioctl+0x1e8/0x20a
 [<ffffffff80243ae0>] do_ioctl+0x2a/0x77
 [<ffffffff802325cc>] vfs_ioctl+0x25a/0x277
 [<ffffffff8024ea4b>] sys_ioctl+0x5f/0x82
 [<ffffffff8026060e>] system_call+0x7e/0x83

The bonding driver nests other drivers, give the bonding driver its own
lock class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0fb5f65..ebbf002 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4692,6 +4692,8 @@ static int bond_check_params(struct bond
 	return 0;
 }
 
+static struct lock_class_key bonding_netdev_xmit_lock_key;
+
 /* Create a new bond based on the specified name and bonding parameters.
  * Caller must NOT hold rtnl_lock; we need to release it here before we
  * set up our sysfs entries.
@@ -4727,6 +4729,9 @@ int bond_create(char *name, struct bond_
 	if (res < 0) {
 		goto out_bond;
 	}
+
+	lockdep_set_class(&bond_dev->_xmit_lock, &bonding_netdev_xmit_lock_key);
+
 	if (newbond)
 		*newbond = bond_dev->priv;
 


Subject: [RHEL5 PATCH] lockdep: more delcare_completion_onstack annotations
From: Peter Zijlstra <pzijlstr@redhat.com>
To: rhkernel-list@redhat.com
Cc: Dave Jones <davej@redhat.com>, Don Zickus <dzickus@redhat.com>
Content-Type: text/plain
Date: Thu, 28 Sep 2006 20:22:35 +0200

BZ208304

All on stack DECLARE_COMPLETIONs should be replaced by:
  DECLARE_COMPLETION_ONSTACK

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 arch/arm/kernel/ecard.c                      |    2 +-
 arch/i386/kernel/smpboot.c                   |    2 +-
 arch/powerpc/platforms/powermac/cpufreq_64.c |    2 +-
 arch/powerpc/platforms/powermac/nvram.c      |    4 ++--
 block/as-iosched.c                           |    2 +-
 block/cfq-iosched.c                          |    2 +-
 drivers/block/DAC960.c                       |    2 +-
 drivers/block/cciss.c                        |    6 +++---
 drivers/block/cciss_scsi.c                   |    2 +-
 drivers/block/paride/pd.c                    |    2 +-
 drivers/block/pktcdvd.c                      |    2 +-
 drivers/ide/ide-tape.c                       |    2 +-
 drivers/macintosh/smu.c                      |    4 ++--
 drivers/macintosh/windfarm_smu_controls.c    |    2 +-
 drivers/macintosh/windfarm_smu_sensors.c     |    2 +-
 drivers/s390/scsi/zfcp_scsi.c                |    2 +-
 drivers/scsi/53c700.c                        |    2 +-
 drivers/scsi/aic7xxx/aic79xx_osm.c           |    4 ++--
 drivers/scsi/aic7xxx/aic7xxx_osm.c           |    2 +-
 drivers/scsi/gdth.c                          |    4 ++--
 drivers/scsi/qla1280.c                       |    4 ++--
 drivers/usb/gadget/inode.c                   |    2 +-
 drivers/usb/gadget/omap_udc.c                |    2 +-
 net/ipv4/ipvs/ip_vs_sync.c                   |    2 +-
 24 files changed, 31 insertions(+), 31 deletions(-)

Index: linux-2.6/arch/arm/kernel/ecard.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/ecard.c
+++ linux-2.6/arch/arm/kernel/ecard.c
@@ -295,7 +295,7 @@ ecard_task(void * unused)
  */
 static void ecard_call(struct ecard_request *req)
 {
-	DECLARE_COMPLETION(completion);
+	DECLARE_COMPLETION_ONSTACK(completion);
 
 	req->complete = &completion;
 
Index: linux-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.orig/arch/i386/kernel/smpboot.c
+++ linux-2.6/arch/i386/kernel/smpboot.c
@@ -1058,7 +1058,7 @@ static void __cpuinit do_warm_boot_cpu(v
 
 static int __cpuinit __smp_prepare_cpu(int cpu)
 {
-	DECLARE_COMPLETION(done);
+	DECLARE_COMPLETION_ONSTACK(done);
 	struct warm_boot_cpu_info info;
 	struct work_struct task;
 	int	apicid, ret;
Index: linux-2.6/arch/powerpc/platforms/powermac/cpufreq_64.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/powermac/cpufreq_64.c
+++ linux-2.6/arch/powerpc/platforms/powermac/cpufreq_64.c
@@ -104,7 +104,7 @@ static void g5_smu_switch_volt(int speed
 {
 	struct smu_simple_cmd	cmd;
 
-	DECLARE_COMPLETION(comp);
+	DECLARE_COMPLETION_ONSTACK(comp);
 	smu_queue_simple(&cmd, SMU_CMD_POWER_COMMAND, 8, smu_done_complete,
 			 &comp, 'V', 'S', 'L', 'E', 'W',
 			 0xff, g5_fvt_cur+1, speed_mode);
Index: linux-2.6/arch/powerpc/platforms/powermac/nvram.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/powermac/nvram.c
+++ linux-2.6/arch/powerpc/platforms/powermac/nvram.c
@@ -195,7 +195,7 @@ static void pmu_nvram_complete(struct ad
 static unsigned char pmu_nvram_read_byte(int addr)
 {
 	struct adb_request req;
-	DECLARE_COMPLETION(req_complete); 
+	DECLARE_COMPLETION_ONSTACK(req_complete);
 	
 	req.arg = system_state == SYSTEM_RUNNING ? &req_complete : NULL;
 	if (pmu_request(&req, pmu_nvram_complete, 3, PMU_READ_NVRAM,
@@ -211,7 +211,7 @@ static unsigned char pmu_nvram_read_byte
 static void pmu_nvram_write_byte(int addr, unsigned char val)
 {
 	struct adb_request req;
-	DECLARE_COMPLETION(req_complete); 
+	DECLARE_COMPLETION_ONSTACK(req_complete);
 	
 	req.arg = system_state == SYSTEM_RUNNING ? &req_complete : NULL;
 	if (pmu_request(&req, pmu_nvram_complete, 4, PMU_WRITE_NVRAM,
Index: linux-2.6/block/as-iosched.c
===================================================================
--- linux-2.6.orig/block/as-iosched.c
+++ linux-2.6/block/as-iosched.c
@@ -1828,7 +1828,7 @@ static int __init as_init(void)
 
 static void __exit as_exit(void)
 {
-	DECLARE_COMPLETION(all_gone);
+	DECLARE_COMPLETION_ONSTACK(all_gone);
 	elv_unregister(&iosched_as);
 	ioc_gone = &all_gone;
 	/* ioc_gone's update must be visible before reading ioc_count */
Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c
+++ linux-2.6/block/cfq-iosched.c
@@ -2463,7 +2463,7 @@ static int __init cfq_init(void)
 
 static void __exit cfq_exit(void)
 {
-	DECLARE_COMPLETION(all_gone);
+	DECLARE_COMPLETION_ONSTACK(all_gone);
 	elv_unregister(&iosched_cfq);
 	ioc_gone = &all_gone;
 	/* ioc_gone's update must be visible before reading ioc_count */
Index: linux-2.6/drivers/block/DAC960.c
===================================================================
--- linux-2.6.orig/drivers/block/DAC960.c
+++ linux-2.6/drivers/block/DAC960.c
@@ -770,7 +770,7 @@ static void DAC960_P_QueueCommand(DAC960
 static void DAC960_ExecuteCommand(DAC960_Command_T *Command)
 {
   DAC960_Controller_T *Controller = Command->Controller;
-  DECLARE_COMPLETION(Completion);
+  DECLARE_COMPLETION_ONSTACK(Completion);
   unsigned long flags;
   Command->Completion = &Completion;
 
Index: linux-2.6/drivers/block/cciss.c
===================================================================
--- linux-2.6.orig/drivers/block/cciss.c
+++ linux-2.6/drivers/block/cciss.c
@@ -879,7 +879,7 @@ static int cciss_ioctl(struct inode *ino
 			char *buff = NULL;
 			u64bit temp64;
 			unsigned long flags;
-			DECLARE_COMPLETION(wait);
+			DECLARE_COMPLETION_ONSTACK(wait);
 
 			if (!arg)
 				return -EINVAL;
@@ -997,7 +997,7 @@ static int cciss_ioctl(struct inode *ino
 			BYTE sg_used = 0;
 			int status = 0;
 			int i;
-			DECLARE_COMPLETION(wait);
+			DECLARE_COMPLETION_ONSTACK(wait);
 			__u32 left;
 			__u32 sz;
 			BYTE __user *data_ptr;
@@ -1792,7 +1792,7 @@ static int sendcmd_withirq(__u8 cmd,
 	u64bit buff_dma_handle;
 	unsigned long flags;
 	int return_status;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 
 	if ((c = cmd_alloc(h, 0)) == NULL)
 		return -ENOMEM;
Index: linux-2.6/drivers/block/cciss_scsi.c
===================================================================
--- linux-2.6.orig/drivers/block/cciss_scsi.c
+++ linux-2.6/drivers/block/cciss_scsi.c
@@ -766,7 +766,7 @@ cciss_scsi_do_simple_cmd(ctlr_info_t *c,
 			int direction)
 {
 	unsigned long flags;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 
 	cp->cmd_type = CMD_IOCTL_PEND;		// treat this like an ioctl 
 	cp->scsi_cmd = NULL;
Index: linux-2.6/drivers/block/paride/pd.c
===================================================================
--- linux-2.6.orig/drivers/block/paride/pd.c
+++ linux-2.6/drivers/block/paride/pd.c
@@ -713,7 +713,7 @@ static void do_pd_request(request_queue_
 static int pd_special_command(struct pd_unit *disk,
 		      enum action (*func)(struct pd_unit *disk))
 {
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request rq;
 	int err = 0;
 
Index: linux-2.6/drivers/block/pktcdvd.c
===================================================================
--- linux-2.6.orig/drivers/block/pktcdvd.c
+++ linux-2.6/drivers/block/pktcdvd.c
@@ -348,7 +348,7 @@ static int pkt_generic_packet(struct pkt
 	char sense[SCSI_SENSE_BUFFERSIZE];
 	request_queue_t *q;
 	struct request *rq;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 	int err = 0;
 
 	q = bdev_get_queue(pd->bdev);
Index: linux-2.6/drivers/ide/ide-tape.c
===================================================================
--- linux-2.6.orig/drivers/ide/ide-tape.c
+++ linux-2.6/drivers/ide/ide-tape.c
@@ -2764,7 +2764,7 @@ static void idetape_add_stage_tail (ide_
  */
 static void idetape_wait_for_request (ide_drive_t *drive, struct request *rq)
 {
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 	idetape_tape_t *tape = drive->driver_data;
 
 #if IDETAPE_DEBUG_BUGS
Index: linux-2.6/drivers/macintosh/smu.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/smu.c
+++ linux-2.6/drivers/macintosh/smu.c
@@ -870,7 +870,7 @@ int smu_queue_i2c(struct smu_i2c_cmd *cm
 
 static int smu_read_datablock(u8 *dest, unsigned int addr, unsigned int len)
 {
-	DECLARE_COMPLETION(comp);
+	DECLARE_COMPLETION_ONSTACK(comp);
 	unsigned int chunk;
 	struct smu_cmd cmd;
 	int rc;
@@ -917,7 +917,7 @@ static int smu_read_datablock(u8 *dest, 
 
 static struct smu_sdbp_header *smu_create_sdb_partition(int id)
 {
-	DECLARE_COMPLETION(comp);
+	DECLARE_COMPLETION_ONSTACK(comp);
 	struct smu_simple_cmd cmd;
 	unsigned int addr, len, tlen;
 	struct smu_sdbp_header *hdr;
Index: linux-2.6/drivers/macintosh/windfarm_smu_controls.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/windfarm_smu_controls.c
+++ linux-2.6/drivers/macintosh/windfarm_smu_controls.c
@@ -56,7 +56,7 @@ static int smu_set_fan(int pwm, u8 id, u
 {
 	struct smu_cmd cmd;
 	u8 buffer[16];
-	DECLARE_COMPLETION(comp);
+	DECLARE_COMPLETION_ONSTACK(comp);
 	int rc;
 
 	/* Fill SMU command structure */
Index: linux-2.6/drivers/macintosh/windfarm_smu_sensors.c
===================================================================
--- linux-2.6.orig/drivers/macintosh/windfarm_smu_sensors.c
+++ linux-2.6/drivers/macintosh/windfarm_smu_sensors.c
@@ -67,7 +67,7 @@ static void smu_ads_release(struct wf_se
 static int smu_read_adc(u8 id, s32 *value)
 {
 	struct smu_simple_cmd	cmd;
-	DECLARE_COMPLETION(comp);
+	DECLARE_COMPLETION_ONSTACK(comp);
 	int rc;
 
 	rc = smu_queue_simple(&cmd, SMU_CMD_READ_ADC, 1,
Index: linux-2.6/drivers/s390/scsi/zfcp_scsi.c
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_scsi.c
+++ linux-2.6/drivers/s390/scsi/zfcp_scsi.c
@@ -301,7 +301,7 @@ zfcp_scsi_command_sync(struct zfcp_unit 
 		       int use_timer)
 {
 	int ret;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 
 	scpnt->SCp.ptr = (void *) &wait;  /* silent re-use */
 	scpnt->scsi_done = zfcp_scsi_command_sync_handler;
Index: linux-2.6/drivers/scsi/53c700.c
===================================================================
--- linux-2.6.orig/drivers/scsi/53c700.c
+++ linux-2.6/drivers/scsi/53c700.c
@@ -1939,7 +1939,7 @@ NCR_700_abort(struct scsi_cmnd * SCp)
 STATIC int
 NCR_700_bus_reset(struct scsi_cmnd * SCp)
 {
-	DECLARE_COMPLETION(complete);
+	DECLARE_COMPLETION_ONSTACK(complete);
 	struct NCR_700_Host_Parameters *hostdata = 
 		(struct NCR_700_Host_Parameters *)SCp->device->host->hostdata[0];
 
Index: linux-2.6/drivers/scsi/aic7xxx/aic79xx_osm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ linux-2.6/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -646,7 +646,7 @@ ahd_linux_dev_reset(struct scsi_cmnd *cm
 	struct	ahd_initiator_tinfo *tinfo;
 	struct	ahd_tmode_tstate *tstate;
 	unsigned long flags;
-	DECLARE_COMPLETION(done);
+	DECLARE_COMPLETION_ONSTACK(done);
 
 	reset_scb = NULL;
 	paused = FALSE;
@@ -2251,7 +2251,7 @@ done:
 	if (paused)
 		ahd_unpause(ahd);
 	if (wait) {
-		DECLARE_COMPLETION(done);
+		DECLARE_COMPLETION_ONSTACK(done);
 
 		ahd->platform_data->eh_done = &done;
 		ahd_unlock(ahd, &flags);
Index: linux-2.6/drivers/scsi/aic7xxx/aic7xxx_osm.c
===================================================================
--- linux-2.6.orig/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ linux-2.6/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -2335,7 +2335,7 @@ done:
 	if (paused)
 		ahc_unpause(ahc);
 	if (wait) {
-		DECLARE_COMPLETION(done);
+		DECLARE_COMPLETION_ONSTACK(done);
 
 		ahc->platform_data->eh_done = &done;
 		ahc_unlock(ahc, &flags);
Index: linux-2.6/drivers/scsi/gdth.c
===================================================================
--- linux-2.6.orig/drivers/scsi/gdth.c
+++ linux-2.6/drivers/scsi/gdth.c
@@ -724,7 +724,7 @@ int __gdth_execute(struct scsi_device *s
                    int timeout, u32 *info)
 {
     Scsi_Cmnd *scp;
-    DECLARE_COMPLETION(wait);
+    DECLARE_COMPLETION_ONSTACK(wait);
     int rval;
 
     scp = kmalloc(sizeof(*scp), GFP_KERNEL);
@@ -764,7 +764,7 @@ int __gdth_execute(struct scsi_device *s
 {
     Scsi_Cmnd *scp = scsi_allocate_device(sdev, 1, FALSE);
     unsigned bufflen = gdtcmd ? sizeof(gdth_cmd_str) : 0;
-    DECLARE_COMPLETION(wait);
+    DECLARE_COMPLETION_ONSTACK(wait);
     int rval;
 
     if (!scp)
Index: linux-2.6/drivers/scsi/qla1280.c
===================================================================
--- linux-2.6.orig/drivers/scsi/qla1280.c
+++ linux-2.6/drivers/scsi/qla1280.c
@@ -813,7 +813,7 @@ qla1280_error_action(struct scsi_cmnd *c
 	uint16_t data;
 	unsigned char *handle;
 	int result, i;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 	struct timer_list timer;
 
 	ha = (struct scsi_qla_host *)(CMD_HOST(cmd)->hostdata);
@@ -2406,7 +2406,7 @@ qla1280_mailbox_command(struct scsi_qla_
 	uint16_t *optr, *iptr;
 	uint16_t __iomem *mptr;
 	uint16_t data;
-	DECLARE_COMPLETION(wait);
+	DECLARE_COMPLETION_ONSTACK(wait);
 	struct timer_list timer;
 
 	ENTER("qla1280_mailbox_command");
Index: linux-2.6/drivers/usb/gadget/inode.c
===================================================================
--- linux-2.6.orig/drivers/usb/gadget/inode.c
+++ linux-2.6/drivers/usb/gadget/inode.c
@@ -342,7 +342,7 @@ fail:
 static ssize_t
 ep_io (struct ep_data *epdata, void *buf, unsigned len)
 {
-	DECLARE_COMPLETION (done);
+	DECLARE_COMPLETION_ONSTACK (done);
 	int value;
 
 	spin_lock_irq (&epdata->dev->lock);
Index: linux-2.6/drivers/usb/gadget/omap_udc.c
===================================================================
--- linux-2.6.orig/drivers/usb/gadget/omap_udc.c
+++ linux-2.6/drivers/usb/gadget/omap_udc.c
@@ -2869,7 +2869,7 @@ cleanup0:
 
 static int __exit omap_udc_remove(struct platform_device *pdev)
 {
-	DECLARE_COMPLETION(done);
+	DECLARE_COMPLETION_ONSTACK(done);
 
 	if (!udc)
 		return -ENODEV;
Index: linux-2.6/net/ipv4/ipvs/ip_vs_sync.c
===================================================================
--- linux-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c
+++ linux-2.6/net/ipv4/ipvs/ip_vs_sync.c
@@ -836,7 +836,7 @@ static int fork_sync_thread(void *startu
 
 int start_sync_thread(int state, char *mcast_ifn, __u8 syncid)
 {
-	DECLARE_COMPLETION(startup);
+	DECLARE_COMPLETION_ONSTACK(startup);
 	pid_t pid;
 
 	if ((state == IP_VS_STATE_MASTER && sync_master_pid) ||


Subject: [RHEL5 PATCH 7/6] revert earlier ps2 patch
From: Peter Zijlstra <pzijlstr@redhat.com>
To: rhkernel-list@redhat.com
Cc: Don Zickus <dzickus@redhat.com>, davej@redhat.com
Content-Type: text/plain
Date: Thu, 28 Sep 2006 15:10:52 +0200


Revert an earlier ps2dev->cmd_mutex fixup.
The extra mutex_init() destroys the just set lock class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 drivers/input/serio/libps2.c |    4 ++--
 include/linux/libps2.h       |    7 +------
 2 files changed, 3 insertions(+), 8 deletions(-)

Index: linux-2.6.18.noarch/drivers/input/serio/libps2.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/input/serio/libps2.c
+++ linux-2.6.18.noarch/drivers/input/serio/libps2.c
@@ -27,7 +27,7 @@ MODULE_AUTHOR("Dmitry Torokhov <dtor@mai
 MODULE_DESCRIPTION("PS/2 driver library");
 MODULE_LICENSE("GPL");
 
-EXPORT_SYMBOL(__ps2_init);
+EXPORT_SYMBOL(ps2_init);
 EXPORT_SYMBOL(ps2_sendbyte);
 EXPORT_SYMBOL(ps2_drain);
 EXPORT_SYMBOL(ps2_command);
@@ -286,7 +286,7 @@ static struct lock_class_key ps2_mutex_k
  * ps2_init() initializes ps2dev structure
  */
 
-void __ps2_init(struct ps2dev *ps2dev, struct serio *serio)
+void ps2_init(struct ps2dev *ps2dev, struct serio *serio)
 {
 	mutex_init(&ps2dev->cmd_mutex);
 	lockdep_set_class_and_subclass(&ps2dev->cmd_mutex, &ps2_mutex_key,
Index: linux-2.6.18.noarch/include/linux/libps2.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/libps2.h
+++ linux-2.6.18.noarch/include/linux/libps2.h
@@ -39,12 +39,7 @@ struct ps2dev {
 	unsigned char nak;
 };
 
-void __ps2_init(struct ps2dev *ps2dev, struct serio *serio);
-static inline void ps2_init(struct ps2dev *ps2dev, struct serio *serio)
-{
-	__ps2_init(ps2dev, serio);
-	mutex_init(&ps2dev->cmd_mutex);
-}
+void ps2_init(struct ps2dev *ps2dev, struct serio *serio);
 int ps2_sendbyte(struct ps2dev *ps2dev, unsigned char byte, int timeout);
 void ps2_drain(struct ps2dev *ps2dev, int maxbytes, int timeout);
 int ps2_command(struct ps2dev *ps2dev, unsigned char *param, int command);


Subject: [RHEL5 PATCH] lockdep annotate nfs/nfsd in-kernel sockets
From: Peter Zijlstra <pzijlstr@redhat.com>
To: rhkernel-list@redhat.com
Cc: Steve Dickson <SteveD@redhat.com>
Date: Fri, 06 Oct 2006 16:09:41 +0200

BZ208439

SteveD helped catch and verified
---

Stick NFS sockets in their own class to avoid some lockdep warnings.
NFS sockets are never exposed to user-space, and will hence not trigger
certain code paths that would otherwise pose deadlock scenarios.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
---
 include/net/sock.h    |   19 +++++++++++++++++++
 kernel/lockdep.c      |    1 +
 net/core/sock.c       |   23 +++++------------------
 net/sunrpc/svcsock.c  |   33 +++++++++++++++++++++++++++++++++
 net/sunrpc/xprtsock.c |   33 +++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+), 18 deletions(-)

Index: linux-2.6.18.noarch/include/net/sock.h
===================================================================
--- linux-2.6.18.noarch.orig/include/net/sock.h
+++ linux-2.6.18.noarch/include/net/sock.h
@@ -748,6 +748,25 @@ static inline int sk_stream_wmem_schedul
  */
 #define sock_owned_by_user(sk)	((sk)->sk_lock.owner)
 
+/*
+ * Macro so as to not evaluate some arguments when
+ * lockdep is not enabled.
+ *
+ * Mark both the sk_lock and the sk_lock.slock as a
+ * per-address-family lock class.
+ */
+#define sock_lock_init_class_and_name(sk, sname, skey, name, key) 	\
+do {									\
+	sk->sk_lock.owner = NULL;					\
+	init_waitqueue_head(&sk->sk_lock.wq);				\
+	spin_lock_init(&(sk)->sk_lock.slock);				\
+	debug_check_no_locks_freed((void *)&(sk)->sk_lock,		\
+			sizeof((sk)->sk_lock));				\
+	lockdep_set_class_and_name(&(sk)->sk_lock.slock,		\
+		       	(skey), (sname));				\
+	lockdep_init_map(&(sk)->sk_lock.dep_map, (name), (key), 0);	\
+} while (0)
+
 extern void FASTCALL(lock_sock(struct sock *sk));
 extern void FASTCALL(release_sock(struct sock *sk));
 
Index: linux-2.6.18.noarch/kernel/lockdep.c
===================================================================
--- linux-2.6.18.noarch.orig/kernel/lockdep.c
+++ linux-2.6.18.noarch/kernel/lockdep.c
@@ -2638,6 +2638,7 @@ void debug_check_no_locks_freed(const vo
 	}
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
 static void print_held_locks_bug(struct task_struct *curr)
 {
Index: linux-2.6.18.noarch/net/core/sock.c
===================================================================
--- linux-2.6.18.noarch.orig/net/core/sock.c
+++ linux-2.6.18.noarch/net/core/sock.c
@@ -810,24 +810,11 @@ lenout:
  */
 static void inline sock_lock_init(struct sock *sk)
 {
-	spin_lock_init(&sk->sk_lock.slock);
-	sk->sk_lock.owner = NULL;
-	init_waitqueue_head(&sk->sk_lock.wq);
-	/*
-	 * Make sure we are not reinitializing a held lock:
-	 */
-	debug_check_no_locks_freed((void *)&sk->sk_lock, sizeof(sk->sk_lock));
-
-	/*
-	 * Mark both the sk_lock and the sk_lock.slock as a
-	 * per-address-family lock class:
-	 */
-	lockdep_set_class_and_name(&sk->sk_lock.slock,
-				   af_family_slock_keys + sk->sk_family,
-				   af_family_slock_key_strings[sk->sk_family]);
-	lockdep_init_map(&sk->sk_lock.dep_map,
-			 af_family_key_strings[sk->sk_family],
-			 af_family_keys + sk->sk_family, 0);
+	sock_lock_init_class_and_name(sk,
+			af_family_slock_key_strings[sk->sk_family],
+			af_family_slock_keys + sk->sk_family,
+			af_family_key_strings[sk->sk_family],
+			af_family_keys + sk->sk_family);
 }
 
 /**
Index: linux-2.6.18.noarch/net/sunrpc/xprtsock.c
===================================================================
--- linux-2.6.18.noarch.orig/net/sunrpc/xprtsock.c
+++ linux-2.6.18.noarch/net/sunrpc/xprtsock.c
@@ -1004,6 +1004,37 @@ static int xs_bindresvport(struct rpc_xp
 	return err;
 }
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key xs_key[2];
+static struct lock_class_key xs_slock_key[2];
+
+static inline void xs_reclassify_socket(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	BUG_ON(sk->sk_lock.owner != NULL);
+	switch (sk->sk_family) {
+		case AF_INET:
+			sock_lock_init_class_and_name(sk,
+				"slock-AF_INET-NFS", &xs_slock_key[0],
+				"sk_lock-AF_INET-NFS", &xs_key[0]);
+			break;
+
+		case AF_INET6:
+			sock_lock_init_class_and_name(sk,
+				"slock-AF_INET6-NFS", &xs_slock_key[1],
+				"sk_lock-AF_INET6-NFS", &xs_key[1]);
+			break;
+
+		default:
+			BUG();
+	}
+}
+#else
+static inline void xs_reclassify_socket(struct socket *sock)
+{
+}
+#endif
+
 /**
  * xs_udp_connect_worker - set up a UDP socket
  * @args: RPC transport to connect
@@ -1028,6 +1059,7 @@ static void xs_udp_connect_worker(void *
 		dprintk("RPC:      can't create UDP transport socket (%d).\n", -err);
 		goto out;
 	}
+	xs_reclassify_socket(sock);
 
 	if (xprt->resvport && xs_bindresvport(xprt, sock) < 0) {
 		sock_release(sock);
@@ -1110,6 +1142,7 @@ static void xs_tcp_connect_worker(void *
 			dprintk("RPC:      can't create TCP transport socket (%d).\n", -err);
 			goto out;
 		}
+		xs_reclassify_socket(sock);
 
 		if (xprt->resvport && xs_bindresvport(xprt, sock) < 0) {
 			sock_release(sock);
Index: linux-2.6.18.noarch/net/sunrpc/svcsock.c
===================================================================
--- linux-2.6.18.noarch.orig/net/sunrpc/svcsock.c
+++ linux-2.6.18.noarch/net/sunrpc/svcsock.c
@@ -73,6 +73,37 @@ static struct svc_deferred_req *svc_defe
 static int svc_deferred_recv(struct svc_rqst *rqstp);
 static struct cache_deferred_req *svc_defer(struct cache_req *req);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key svc_key[2];
+static struct lock_class_key svc_slock_key[2];
+
+static inline void svc_reclassify_socket(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+	BUG_ON(sk->sk_lock.owner != NULL);
+	switch (sk->sk_family) {
+		case AF_INET:
+			sock_lock_init_class_and_name(sk,
+				"slock-AF_INET-NFSD", &svc_slock_key[0],
+				"sk_lock-AF_INET-NFSD", &svc_key[0]);
+			break;
+
+		case AF_INET6:
+			sock_lock_init_class_and_name(sk,
+				"slock-AF_INET6-NFSD", &svc_slock_key[1],
+				"sk_lock-AF_INET6-NFSD", &svc_key[1]);
+			break;
+
+		default:
+			BUG();
+	}
+}
+#else
+static inline void svc_reclassify_socket(struct socket *sock)
+{
+}
+#endif
+
 /*
  * Queue up an idle server thread.  Must have serv->sv_lock held.
  * Note: this is really a stack rather than a queue, so that we only
@@ -1403,6 +1434,8 @@ svc_create_socket(struct svc_serv *serv,
 	if ((error = sock_create_kern(PF_INET, type, protocol, &sock)) < 0)
 		return error;
 
+	svc_reclassify_socket(sock);
+
 	if (sin != NULL) {
 		if (type == SOCK_STREAM)
 			sock->sk->sk_reuse = 1; /* allow address reuse */


Subject: [RHEL5 PATCH] rt-mutex: fixup rt-mutex debug code
From: Peter Zijlstra <pzijlstr@redhat.com>
To: rhkernel-list@redhat.com
Cc: Don Zickus <dzickus@redhat.com>, Dave Jones <davej@redhat.com>
Content-Type: text/plain
Date: Thu, 12 Oct 2006 17:04:07 +0200

BZ208165

BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
 [<c04051e3>] show_trace_log_lvl+0x58/0x16a
 [<c04057f0>] show_trace+0xd/0x10
 [<c0405900>] dump_stack+0x19/0x1b
 [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
 [<c04224c0>] free_task+0x15/0x24
 [<c042378c>] copy_process+0x12bd/0x1324
 [<c0423835>] do_fork+0x42/0x113
 [<c04021dd>] sys_fork+0x19/0x1b
 [<c0403fb7>] syscall_call+0x7/0xb

In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
->pi_waiters and ->pi_blocked_on members. rt_mutex_debug_task_free() 
called from free_task() validates these members. However free_task()
can be invoked before these members are reset for the new task.

Move the initialization code before the first bail that can hit free_task().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/fork.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.18.noarch/kernel/fork.c
===================================================================
--- linux-2.6.18.noarch.orig/kernel/fork.c
+++ linux-2.6.18.noarch/kernel/fork.c
@@ -979,6 +979,8 @@ static struct task_struct *copy_process(
 	if (!p)
 		goto fork_out;
 
+	rt_mutex_init_task(p);
+
 	p->tux_info = NULL;
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -1084,8 +1086,6 @@ static struct task_struct *copy_process(
 	p->lockdep_recursion = 0;
 #endif
 
-	rt_mutex_init_task(p);
-
 #ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on = NULL; /* not blocked yet */
 #endif


Date: Mon, 09 Oct 2006 20:14:38 +0200
From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH] lockdep: annotate i386-apm irq usage

BZ209480

---

Lockdep doesn't like to enable interrupts when they are enabled already.

BUG: warning at kernel/lockdep.c:1814/trace_hardirqs_on() (Not tainted)
 [<c04051ed>] show_trace_log_lvl+0x58/0x16a
 [<c04057fa>] show_trace+0xd/0x10
 [<c0405913>] dump_stack+0x19/0x1b
 [<c043abfb>] trace_hardirqs_on+0xa2/0x11e
 [<c041463c>] apm_bios_call_simple+0xcd/0xfd
 [<c0415242>] apm+0x92/0x5b1
 [<c0402005>] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
 [<c04057fa>] show_trace+0xd/0x10
 [<c0405913>] dump_stack+0x19/0x1b
 [<c043abfb>] trace_hardirqs_on+0xa2/0x11e
 [<c041463c>] apm_bios_call_simple+0xcd/0xfd
 [<c0415242>] apm+0x92/0x5b1
 [<c0402005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/i386/kernel/apm.c |   25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Index: linux-2.6.18.noarch/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.18.noarch.orig/arch/i386/kernel/apm.c
+++ linux-2.6.18.noarch/arch/i386/kernel/apm.c
@@ -539,11 +539,22 @@ static inline void apm_restore_cpus(cpum
  * Also, we KNOW that for the non error case of apm_bios_call, there
  * is no useful data returned in the low order 8 bits of eax.
  */
-#define APM_DO_CLI	\
-	if (apm_info.allow_ints) \
-		local_irq_enable(); \
-	else \
-		local_irq_disable();
+#define APM_DO_CLI \
+	do { \
+		if (apm_info.allow_ints) { \
+			if (irqs_disabled_flags(flags)) \
+				local_irq_enable(); \
+		} else \
+			local_irq_disable(); \
+	} while (0)
+
+#define APM_DO_STI \
+	do { \
+		if (irqs_disabled_flags(flags)) \
+			local_irq_disable(); \
+		else if (irqs_disabled()) \
+			local_irq_enable(); \
+	} while (0)
 
 #ifdef APM_ZERO_SEGS
 #	define APM_DECL_SEGS \
@@ -600,7 +611,7 @@ static u8 apm_bios_call(u32 func, u32 eb
 	APM_DO_SAVE_SEGS;
 	apm_bios_call_asm(func, ebx_in, ecx_in, eax, ebx, ecx, edx, esi);
 	APM_DO_RESTORE_SEGS;
-	local_irq_restore(flags);
+	APM_DO_STI;
 	gdt[0x40 / 8] = save_desc_40;
 	put_cpu();
 	apm_restore_cpus(cpus);
@@ -644,7 +655,7 @@ static u8 apm_bios_call_simple(u32 func,
 	APM_DO_SAVE_SEGS;
 	error = apm_bios_call_simple_asm(func, ebx_in, ecx_in, eax);
 	APM_DO_RESTORE_SEGS;
-	local_irq_restore(flags);
+	APM_DO_STI;
 	gdt[0x40 / 8] = save_desc_40;
 	put_cpu();
 	apm_restore_cpus(cpus);

Date: Wed, 11 Oct 2006 13:08:33 +0200
From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH] lockdep: increase max allowed recursion depth

Ingo pointed me to a patch he posted to lkml in response to a
print_infinite_recursion() warning.

BZ204767
BZ209135

and probably some others

---

hm, does the patch below solve it? In general, lockdep warnings are 
intended to be non-fatal, so i have put in various practical limits on 
internal data structure failure modes. We havent had a /single/ 
lockdep-internal crash ever since lockdep went upstream [the unwinder 
crashes are outside of lockdep], and that's largely due to the good 
internal checks it does.

Recursion within the dependency graph is currently limited to 20, that's 
probably not enough on your box - this patch doubles it to 40. I have 
written the lockdep functions to have as small stackframes as possible, 
so 40 should be OK too. (The practical recursion limit should be 
somewhere between 100 and 200 entries. If we hit that then i'll change 
the algorithm to be iteration-based. Graph walking logic is so easy to 
program via recursion, so i'd like to keep recursion as long as 
possible.)

	Ingo

---
Subject: lockdep: increase max allowed recursion depth
From: Ingo Molnar <mingo@elte.hu>

With lots of CPUs there can be lots of deep dependencies. Will change 
the algorithm to iteration-based if it gets too deep.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/lockdep.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux/kernel/lockdep.c
===================================================================
--- linux.orig/kernel/lockdep.c
+++ linux/kernel/lockdep.c
@@ -575,6 +575,8 @@ static noinline int print_circular_bug_t
 	return 0;
 }
 
+#define RECURSION_LIMIT 40
+
 static int noinline print_infinite_recursion_bug(void)
 {
 	__raw_spin_unlock(&hash_lock);
@@ -595,7 +597,7 @@ check_noncircular(struct lock_class *sou
 	debug_atomic_inc(&nr_cyclic_check_recursions);
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
-	if (depth >= 20)
+	if (depth >= RECURSION_LIMIT)
 		return print_infinite_recursion_bug();
 	/*
 	 * Check this lock's dependency list:
@@ -645,7 +647,7 @@ find_usage_forwards(struct lock_class *s
 
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
-	if (depth >= 20)
+	if (depth >= RECURSION_LIMIT)
 		return print_infinite_recursion_bug();
 
 	debug_atomic_inc(&nr_find_usage_forwards_checks);
@@ -684,7 +686,7 @@ find_usage_backwards(struct lock_class *
 
 	if (depth > max_recursion_depth)
 		max_recursion_depth = depth;
-	if (depth >= 20)
+	if (depth >= RECURSION_LIMIT)
 		return print_infinite_recursion_bug();
 
 	debug_atomic_inc(&nr_find_usage_backwards_checks);

--

When we open (actually blkdev_get) a partition we need to also open (get) the
whole device that holds the partition.  The involves some limited recursion. 
This patch tries to simplify some aspects of this.

As well as opening the whole device, we need to increment ->bd_part_count when
a partition is opened (this is used by rescan_partitions to avoid a rescan if
any partition is active, as that would be confusing).

The main change this patch makes is to move the inc/dec of bd_part_count into
blkdev_{get,put} for the whole rather than doing it in blkdev_{get,put} for
the partition.

More specifically, we introduce __blkdev_get and __blkdev_put which do exactly
what blkdev_{get,put} did, only with an extra "for_part" argument
(blkget_{get,put} then call the __ version with a '0' for the extra argument).

If for_part is 1, then the blkdev is being get(put) because a partition is
being opened(closed) for the first(last) time, and so bd_part_count should be
updated (on success).  The particular advantage of pushing this function down
is that the bd_mutex lock (which is needed to update bd_part_count) is already
held at the lower level.

Note that this slightly changes the semantics of bd_part_count.  Instead of
updating it whenever a partition is opened or released, it is now only updated
on the first open or last release.  This is an adequate semantic as it is only
ever tested for "== 0".

Having introduced these functions we remove the current bd_part_count updates
from do_open (which is really the body of blkdev_get) and call
__blkdev_get(...  1).  Similarly in blkget_put we remove the old bd_part_count
updates and call __blkget_put(..., 1).  This call is moved to the end of
__blkdev_put to avoid nested locks of bd_mutex.

Finally the mutex_lock on whole->bd_mutex in do_open can be removed.  It was
only really needed to protect bd_part_count, and that is now managed (and
protected) within the recursive call.

The observation that bd_part_count is central to the locking issues, and the
modifications to create __blkdev_put are from Peter Zijlstra.

Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 fs/block_dev.c |   51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff .prev/fs/block_dev.c ./fs/block_dev.c
Index: linux-2.6.18.noarch/fs/block_dev.c
===================================================================
--- linux-2.6.18.noarch.orig/fs/block_dev.c
+++ linux-2.6.18.noarch/fs/block_dev.c
@@ -868,7 +868,10 @@ void bd_set_size(struct block_device *bd
 }
 EXPORT_SYMBOL(bd_set_size);
 
-static int do_open(struct block_device *bdev, struct file *file)
+static int __blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags,
+			int for_part);
+
+static int do_open(struct block_device *bdev, struct file *file, int for_part)
 {
 	struct module *owner = NULL;
 	struct gendisk *disk;
@@ -912,25 +915,21 @@ static int do_open(struct block_device *
 			ret = -ENOMEM;
 			if (!whole)
 				goto out_first;
-			ret = blkdev_get(whole, file->f_mode, file->f_flags);
+			BUG_ON(for_part);
+			ret = __blkdev_get(whole, file->f_mode, file->f_flags, 1);
 			if (ret)
 				goto out_first;
 			bdev->bd_contains = whole;
-			mutex_lock(&whole->bd_mutex);
-			whole->bd_part_count++;
 			p = disk->part[part - 1];
 			bdev->bd_inode->i_data.backing_dev_info =
 			   whole->bd_inode->i_data.backing_dev_info;
 			if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {
-				whole->bd_part_count--;
-				mutex_unlock(&whole->bd_mutex);
 				ret = -ENXIO;
 				goto out_first;
 			}
 			kobject_get(&p->kobj);
 			bdev->bd_part = p;
 			bd_set_size(bdev, (loff_t) p->nr_sects << 9);
-			mutex_unlock(&whole->bd_mutex);
 		}
 	} else {
 		put_disk(disk);
@@ -943,13 +942,11 @@ static int do_open(struct block_device *
 			}
 			if (bdev->bd_invalidated)
 				rescan_partitions(bdev->bd_disk, bdev);
-		} else {
-			mutex_lock(&bdev->bd_contains->bd_mutex);
-			bdev->bd_contains->bd_part_count++;
-			mutex_unlock(&bdev->bd_contains->bd_mutex);
 		}
 	}
 	bdev->bd_openers++;
+	if (for_part)
+		bdev->bd_part_count++;
 	mutex_unlock(&bdev->bd_mutex);
 	unlock_kernel();
 	return 0;
@@ -970,7 +967,8 @@ out:
 	return ret;
 }
 
-int blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags)
+static int __blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags,
+			int for_part)
 {
 	/*
 	 * This crockload is due to bad choice of ->open() type.
@@ -985,9 +983,13 @@ int blkdev_get(struct block_device *bdev
 	fake_file.f_dentry = &fake_dentry;
 	fake_dentry.d_inode = bdev->bd_inode;
 
-	return do_open(bdev, &fake_file);
+	return do_open(bdev, &fake_file, for_part);
 }
 
+int blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags)
+{
+	return __blkdev_get(bdev, mode, flags, 0);
+}
 EXPORT_SYMBOL(blkdev_get);
 
 static int blkdev_open(struct inode * inode, struct file * filp)
@@ -1005,7 +1007,7 @@ static int blkdev_open(struct inode * in
 
 	bdev = bd_acquire(inode);
 
-	res = do_open(bdev, filp);
+	res = do_open(bdev, filp, 0);
 	if (res)
 		return res;
 
@@ -1019,14 +1021,18 @@ static int blkdev_open(struct inode * in
 	return res;
 }
 
-int blkdev_put(struct block_device *bdev)
+static int __blkdev_put(struct block_device *bdev, int for_part)
 {
 	int ret = 0;
 	struct inode *bd_inode = bdev->bd_inode;
 	struct gendisk *disk = bdev->bd_disk;
+	struct block_device *victim = NULL;
 
 	mutex_lock(&bdev->bd_mutex);
 	lock_kernel();
+	if (for_part)
+		bdev->bd_part_count--;
+
 	if (!--bdev->bd_openers) {
 		sync_blockdev(bdev);
 		kill_bdev(bdev);
@@ -1034,10 +1040,6 @@ int blkdev_put(struct block_device *bdev
 	if (bdev->bd_contains == bdev) {
 		if (disk->fops->release)
 			ret = disk->fops->release(bd_inode, NULL);
-	} else {
-		mutex_lock(&bdev->bd_contains->bd_mutex);
-		bdev->bd_contains->bd_part_count--;
-		mutex_unlock(&bdev->bd_contains->bd_mutex);
 	}
 	if (!bdev->bd_openers) {
 		struct module *owner = disk->fops->owner;
@@ -1051,17 +1053,22 @@ int blkdev_put(struct block_device *bdev
 		}
 		bdev->bd_disk = NULL;
 		bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
-		if (bdev != bdev->bd_contains) {
-			blkdev_put(bdev->bd_contains);
-		}
+		if (bdev != bdev->bd_contains)
+			victim = bdev->bd_contains;
 		bdev->bd_contains = NULL;
 	}
 	unlock_kernel();
 	mutex_unlock(&bdev->bd_mutex);
 	bdput(bdev);
+	if (victim)
+		__blkdev_put(victim, 1);
 	return ret;
 }
 
+int blkdev_put(struct block_device *bdev)
+{
+	return __blkdev_put(bdev, 0);
+}
 EXPORT_SYMBOL(blkdev_put);
 
 static int blkdev_close(struct inode * inode, struct file * filp)

--

Now that the nesting in blkdev_{get,put} is simpler, adding mutex_lock_nested
is trivial.

Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 fs/block_dev.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff .prev/fs/block_dev.c ./fs/block_dev.c
Index: linux-2.6.18.noarch/fs/block_dev.c
===================================================================
--- linux-2.6.18.noarch.orig/fs/block_dev.c
+++ linux-2.6.18.noarch/fs/block_dev.c
@@ -888,7 +888,7 @@ static int do_open(struct block_device *
 	}
 	owner = disk->fops->owner;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, for_part);
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
 		bdev->bd_contains = bdev;
@@ -1028,7 +1028,7 @@ static int __blkdev_put(struct block_dev
 	struct gendisk *disk = bdev->bd_disk;
 	struct block_device *victim = NULL;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock_nested(&bdev->bd_mutex, for_part);
 	lock_kernel();
 	if (for_part)
 		bdev->bd_part_count--;

--

md_open takes ->reconfig_mutex which causes lockdep to complain.  This
(normally) doesn't have deadlock potential as the possible conflict is with a
reconfig_mutex in a different device.

I say "normally" because if a loop were created in the array->member hierarchy
a deadlock could happen.  However that causes bigger problems than a deadlock
and should be fixed independently.

So we flag the lock in md_open as a nested lock.  This requires defining
mutex_lock_interruptible_nested.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 drivers/md/md.c       |    2 +-
 include/linux/mutex.h |    3 ++-
 kernel/mutex.c        |    8 ++++++++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
Index: linux-2.6.18.noarch/drivers/md/md.c
===================================================================
--- linux-2.6.18.noarch.orig/drivers/md/md.c
+++ linux-2.6.18.noarch/drivers/md/md.c
@@ -4460,7 +4460,7 @@ static int md_open(struct inode *inode, 
 	mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
 	int err;
 
-	if ((err = mddev_lock(mddev)))
+	if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1)))
 		goto out;
 
 	err = 0;
Index: linux-2.6.18.noarch/include/linux/mutex.h
===================================================================
--- linux-2.6.18.noarch.orig/include/linux/mutex.h
+++ linux-2.6.18.noarch/include/linux/mutex.h
@@ -125,8 +125,10 @@ extern int fastcall mutex_lock_interrupt
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+extern int mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass);
 #else
 # define mutex_lock_nested(lock, subclass) mutex_lock(lock)
+# define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
 #endif
 
 /*
Index: linux-2.6.18.noarch/kernel/mutex.c
===================================================================
--- linux-2.6.18.noarch.orig/kernel/mutex.c
+++ linux-2.6.18.noarch/kernel/mutex.c
@@ -206,6 +206,15 @@ mutex_lock_nested(struct mutex *lock, un
 }
 
 EXPORT_SYMBOL_GPL(mutex_lock_nested);
+
+int __sched
+mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass)
+{
+	might_sleep();
+	return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, subclass);
+}
+
+EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
 #endif
 
 /*

--

Date: Fri, 03 Nov 2006 08:30:35 +0100
From: Peter Zijlstra <pzijlstr@redhat.com>
Subject: [RHEL5 PATCH] bdev: fix ->bd_part_count leak

BZ212191 - kernel unable to read partition (device busy)
---

Don't leak a ->bd_part_count when the partition open fails with -ENXIO.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
 fs/block_dev.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.18.noarch/fs/block_dev.c
===================================================================
--- linux-2.6.18.noarch.orig/fs/block_dev.c
+++ linux-2.6.18.noarch/fs/block_dev.c
@@ -870,6 +870,7 @@ EXPORT_SYMBOL(bd_set_size);
 
 static int __blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags,
 			int for_part);
+static int __blkdev_put(struct block_device *bdev, int for_part);
 
 static int do_open(struct block_device *bdev, struct file *file, int for_part)
 {
@@ -955,7 +956,7 @@ out_first:
 	bdev->bd_disk = NULL;
 	bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
 	if (bdev != bdev->bd_contains)
-		blkdev_put(bdev->bd_contains);
+		__blkdev_put(bdev->bd_contains, 1);
 	bdev->bd_contains = NULL;
 	put_disk(disk);
 	module_put(owner);