Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 3148

kernel-2.6.18-238.el5.src.rpm

From: Thomas Graf <tgraf@redhat.com>
Date: Thu, 26 Aug 2010 09:42:05 -0400
Subject: [net] tcp: prevent sending past receiver window with TSO
Message-id: <20100826094205.GA14834@lsx.localdomain>
Patchwork-id: 27824
O-Subject: [RHEL5.6 PATCH] tcp: Prevent sending past receiver window with TSO
	(BZ494400)
Bugzilla: 494400
RH-Acked-by: David S. Miller <davem@redhat.com>
RH-Acked-by: Jiri Olsa <jolsa@redhat.com>
RH-Acked-by: Jiri Pirko <jpirko@redhat.com>

Various sources, some of them internal, reported tcp connections to
hang while "TCP: Treason uncloacked! [...]" messages flood the console.

In order to fix the issue, multiple upstream commits have been back
ported. The problem itself is fixed in commit:

commit 5ea3a7480606cef06321cd85bc5113c72d2c7c68
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
    [TCP]: Prevent sending past receiver window with TSO (at last skb)

    With TSO it was possible to send past the receiver window when the skb
    to be sent was the last in the write queue while the receiver window
    is the limiting factor. One can notice that there's a loophole in the
    tcp_mss_split_point that lacked a receiver window check for the
    tcp_write_queue_tail() if also cwnd was smaller than the full skb.

    [...]

To make back porting easier, I have applied the following commit beforehand:

commit 0e3a4803aa06cd7bc2cfc1d04289df4f6027640a
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
    [TCP]: Force TSO splits to MSS boundaries

    If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on
    odd boundary by the callers of tcp_window_allows.

    [...]

Additionaly, to get tcp_mss_split_point() in line with upstream, the following
commit and some codestyle cleanups were also applied:

commit 17515408a15fa51c553e67c415502e785145cd7f
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
    [TCP]: Remove superflushious skb == write_queue_tail() check

    Needed can only be more strict than what was checked by the
    earlier common case check for non-tail skbs, thus
    cwnd_len <= needed will never match in that case anyway.

Brew:
https://brewweb.devel.redhat.com/taskinfo?taskID=2704803

Tested by the reported and me.

Resolves BZ494400

Signed-off-by: Jarod Wilson <jarod@redhat.com>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9c6fbbf..2c9546b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -882,13 +882,36 @@ static void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp)
 	}
 }
 
-static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff *skb, unsigned int mss_now, unsigned int cwnd)
+/* Returns the portion of skb which can be sent right away without
+ * introducing MSS oddities to segment boundaries. In rare cases where
+ * mss_now != mss_cache, we will request caller to create a small skb
+ * per input skb which could be mostly avoided here (if desired).
+ *
+ * We explicitly want to create a request for splitting write queue tail
+ * to a small skb for Nagle purposes while avoiding unnecessary modulos,
+ * thus all the complexity (cwnd_len is always MSS multiple which we
+ * return whenever allowed by the other factors). Basically we need the
+ * modulo only when the receiver window alone is the limiting factor or
+ * when we would be allowed to send the split-due-to-Nagle skb fully.
+ */
+static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
+					unsigned int mss_now, unsigned int cwnd)
 {
-	u32 window, cwnd_len;
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 needed, window, cwnd_len;
 
-	window = (tp->snd_una + tp->snd_wnd - TCP_SKB_CB(skb)->seq);
+	window = tp->snd_una + tp->snd_wnd - TCP_SKB_CB(skb)->seq;
 	cwnd_len = mss_now * cwnd;
-	return min(window, cwnd_len);
+
+	if (likely(cwnd_len <= window && skb != skb_peek_tail(&sk->sk_write_queue)))
+		return cwnd_len;
+
+	needed = min(skb->len, window);
+
+	if (cwnd_len <= needed)
+		return cwnd_len;
+
+	return needed - needed % mss_now;
 }
 
 /* Can at least one segment of SKB be sent right now, according to the
@@ -1330,17 +1353,9 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 		}
 
 		limit = mss_now;
-		if (tso_segs > 1 && !tp->urg_mode) {
-			limit = tcp_window_allows(tp, skb,
-						  mss_now, cwnd_quota);
-
-			if (skb->len < limit) {
-				unsigned int trim = skb->len % mss_now;
-
-				if (trim)
-					limit = skb->len - trim;
-			}
-		}
+		if (tso_segs > 1 && !tp->urg_mode)
+			limit = tcp_mss_split_point(sk, skb, mss_now,
+						    cwnd_quota);
 
 		if (skb->len > limit &&
 		    unlikely(tso_fragment(sk, skb, limit, mss_now)))
@@ -1403,17 +1418,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 		BUG_ON(!tso_segs);
 
 		limit = mss_now;
-		if (tso_segs > 1 && !tp->urg_mode) {
-			limit = tcp_window_allows(tp, skb,
-						  mss_now, cwnd_quota);
-
-			if (skb->len < limit) {
-				unsigned int trim = skb->len % mss_now;
-
-				if (trim)
-					limit = skb->len - trim;
-			}
-		}
+		if (tso_segs > 1 && !tp->urg_mode)
+			limit = tcp_mss_split_point(sk, skb, mss_now,
+						    cwnd_quota);
 
 		if (skb->len > limit &&
 		    unlikely(tso_fragment(sk, skb, limit, mss_now)))