Sophie

Sophie

distrib > Scientific%20Linux > 5x > x86_64 > by-pkgid > 27922b4260f65d317aabda37e42bbbff > files > 2479

kernel-2.6.18-238.el5.src.rpm

From: Andy Gospodarek <gospo@redhat.com>
Date: Wed, 19 Dec 2007 10:37:32 -0500
Subject: [net] bonding: documentation update
Message-id: 20071219153732.GM28834@gospo.usersys.redhat.com
O-Subject: [RHEL5.2 PATCH] bonding: documentation update
Bugzilla: 235711

This is an update to the bonding documentation based on changes in the
way we create bonds using initscripts and also adds a description for a
new feature that is included in my recent patch-set.  This is all
straight from upstream.

This satisfies the request in BZ 235711.

Acked-by: Neil Horman <nhorman@redhat.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index afac780..a0cda06 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -1,7 +1,7 @@
 
 		Linux Ethernet Bonding Driver HOWTO
 
-		Latest update: 24 April 2006
+		Latest update: 12 November 2007
 
 Initial release : Thomas Davis <tadavis at lbl.gov>
 Corrections, HA extensions : 2000/10/03-15 :
@@ -166,12 +166,17 @@ to use ifenslave.
 2. Bonding Driver Options
 =========================
 
-	Options for the bonding driver are supplied as parameters to
-the bonding module at load time.  They may be given as command line
-arguments to the insmod or modprobe command, but are usually specified
-in either the /etc/modules.conf or /etc/modprobe.conf configuration
-file, or in a distro-specific configuration file (some of which are
-detailed in the next section).
+	Options for the bonding driver are supplied as parameters to the
+bonding module at load time, or are specified via sysfs.
+
+	Module options may be given as command line arguments to the
+insmod or modprobe command, but are usually specified in either the
+/etc/modules.conf or /etc/modprobe.conf configuration file, or in a
+distro-specific configuration file (some of which are detailed in the next
+section).
+
+	Details on bonding support for sysfs is provided in the
+"Configuring Bonding Manually via Sysfs" section, below.
 
 	The available bonding driver parameters are listed below. If a
 parameter is not specified the default value is used.  When initially
@@ -192,6 +197,17 @@ or, for backwards compatibility, the option value.  E.g.,
 arp_interval
 
 	Specifies the ARP link monitoring frequency in milliseconds.
+
+	The ARP monitor works by periodically checking the slave
+	devices to determine whether they have sent or received
+	traffic recently (the precise criteria depends upon the
+	bonding mode, and the state of the slave).  Regular traffic is
+	generated via ARP probes issued for the addresses specified by
+	the arp_ip_target option.
+
+	This behavior can be modified by the arp_validate option,
+	below.
+
 	If ARP monitoring is used in an etherchannel compatible mode
 	(modes 0 and 2), the switch should be configured in a mode
 	that evenly distributes packets across all links. If the
@@ -213,6 +229,54 @@ arp_ip_target
 	maximum number of targets that can be specified is 16.  The
 	default value is no IP addresses.
 
+arp_validate
+
+	Specifies whether or not ARP probes and replies should be
+	validated in the active-backup mode.  This causes the ARP
+	monitor to examine the incoming ARP requests and replies, and
+	only consider a slave to be up if it is receiving the
+	appropriate ARP traffic.
+
+	Possible values are:
+
+	none or 0
+
+		No validation is performed.  This is the default.
+
+	active or 1
+
+		Validation is performed only for the active slave.
+
+	backup or 2
+
+		Validation is performed only for backup slaves.
+
+	all or 3
+
+		Validation is performed for all slaves.
+
+	For the active slave, the validation checks ARP replies to
+	confirm that they were generated by an arp_ip_target.  Since
+	backup slaves do not typically receive these replies, the
+	validation performed for backup slaves is on the ARP request
+	sent out via the active slave.  It is possible that some
+	switch or network configurations may result in situations
+	wherein the backup slaves do not receive the ARP requests; in
+	such a situation, validation of backup slaves must be
+	disabled.
+
+	This option is useful in network configurations in which
+	multiple bonding hosts are concurrently issuing ARPs to one or
+	more targets beyond a common switch.  Should the link between
+	the switch and target fail (but not the switch itself), the
+	probe traffic generated by the multiple bonding instances will
+	fool the standard ARP monitor into considering the links as
+	still up.  Use of the arp_validate option can resolve this, as
+	the ARP monitor will only consider ARP requests and replies
+	associated with its own instance of bonding.
+
+	This option was added in bonding version 3.1.0.
+
 downdelay
 
 	Specifies the time, in milliseconds, to wait before disabling
@@ -222,6 +286,39 @@ downdelay
 	will be rounded down to the nearest multiple.  The default
 	value is 0.
 
+fail_over_mac
+
+	Specifies whether active-backup mode should set all slaves to
+	the same MAC address (the traditional behavior), or, when
+	enabled, change the bond's MAC address when changing the
+	active interface (i.e., fail over the MAC address itself).
+
+	Fail over MAC is useful for devices that cannot ever alter
+	their MAC address, or for devices that refuse incoming
+	broadcasts with their own source MAC (which interferes with
+	the ARP monitor).
+
+	The down side of fail over MAC is that every device on the
+	network must be updated via gratuitous ARP, vs. just updating
+	a switch or set of switches (which often takes place for any
+	traffic, not just ARP traffic, if the switch snoops incoming
+	traffic to update its tables) for the traditional method.  If
+	the gratuitous ARP is lost, communication may be disrupted.
+
+	When fail over MAC is used in conjuction with the mii monitor,
+	devices which assert link up prior to being able to actually
+	transmit and receive are particularly susecptible to loss of
+	the gratuitous ARP, and an appropriate updelay setting may be
+	required.
+
+	A value of 0 disables fail over MAC, and is the default.  A
+	value of 1 enables fail over MAC.  This option is enabled
+	automatically if the first slave added cannot change its MAC
+	address.  This option may be modified via sysfs only when no
+	slaves are present in the bond.
+
+	This option was added in bonding version 3.2.0.
+
 lacp_rate
 
 	Option specifying the rate in which we'll ask our link partner
@@ -462,6 +559,30 @@ xmit_hash_policy
 
 		This algorithm is 802.3ad compliant.
 
+	layer2+3
+
+		This policy uses a combination of layer2 and layer3
+		protocol information to generate the hash.
+
+		Uses XOR of hardware MAC addresses and IP addresses to
+		generate the hash.  The formula is
+
+		(((source IP XOR dest IP) AND 0xffff) XOR
+			( source MAC XOR destination MAC ))
+				modulo slave count
+
+		This algorithm will place all traffic to a particular
+		network peer on the same slave.  For non-IP traffic,
+		the formula is the same as for the layer2 transmit
+		hash policy.
+
+		This policy is intended to provide a more balanced
+		distribution of traffic than layer2 alone, especially
+		in environments where a layer3 gateway device is
+		required to reach most destinations.
+
+		This algorithm is 802.3ad complient.
+
 	layer3+4
 
 		This policy uses upper layer protocol information,
@@ -497,8 +618,9 @@ xmit_hash_policy
 		or may not tolerate this noncompliance.
 
 	The default value is layer2.  This option was added in bonding
-version 2.6.3.  In earlier versions of bonding, this parameter does
-not exist, and the layer2 policy is the only policy.
+	version 2.6.3.  In earlier versions of bonding, this parameter
+	does not exist, and the layer2 policy is the only policy.  The
+	layer2+3 value was added for bonding version 3.2.2.
 
 
 3. Configuring Bonding Devices
@@ -695,11 +817,13 @@ the system /etc/modules.conf or /etc/modprobe.conf configuration file.
 3.2 Configuration with Initscripts Support
 ------------------------------------------
 
-	This section applies to distros using a version of initscripts
-with bonding support, for example, Red Hat Linux 9 or Red Hat
-Enterprise Linux version 3 or 4.  On these systems, the network
-initialization scripts have some knowledge of bonding, and can be
-configured to control bonding devices.
+	This section applies to distros using a recent version of
+initscripts with bonding support, for example, Red Hat Enterprise Linux
+version 3 or later, Fedora, etc.  On these systems, the network
+initialization scripts have knowledge of bonding, and can be configured to
+control bonding devices.  Note that older versions of the initscripts
+package have lower levels of support for bonding; this will be noted where
+applicable.
 
 	These distros will not automatically load the network adapter
 driver unless the ethX device is configured with an IP address.
@@ -747,11 +871,31 @@ USERCTL=no
 	Be sure to change the networking specific lines (IPADDR,
 NETMASK, NETWORK and BROADCAST) to match your network configuration.
 
-	Finally, it is necessary to edit /etc/modules.conf (or
-/etc/modprobe.conf, depending upon your distro) to load the bonding
-module with your desired options when the bond0 interface is brought
-up.  The following lines in /etc/modules.conf (or modprobe.conf) will
-load the bonding module, and select its options:
+	For later versions of initscripts, such as that found with Fedora
+7 and Red Hat Enterprise Linux version 5 (or later), it is possible, and,
+indeed, preferable, to specify the bonding options in the ifcfg-bond0
+file, e.g. a line of the format:
+
+BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=+192.168.1.254"
+
+	will configure the bond with the specified options.  The options
+specified in BONDING_OPTS are identical to the bonding module parameters
+except for the arp_ip_target field.  Each target should be included as a
+separate option and should be preceded by a '+' to indicate it should be
+added to the list of queried targets, e.g.,
+
+	arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2
+
+	is the proper syntax to specify multiple targets.  When specifying
+options via BONDING_OPTS, it is not necessary to edit /etc/modules.conf or
+/etc/modprobe.conf.
+
+	For older versions of initscripts that do not support
+BONDING_OPTS, it is necessary to edit /etc/modules.conf (or
+/etc/modprobe.conf, depending upon your distro) to load the bonding module
+with your desired options when the bond0 interface is brought up.  The
+following lines in /etc/modules.conf (or modprobe.conf) will load the
+bonding module, and select its options:
 
 alias bond0 bonding
 options bond0 mode=balance-alb miimon=100
@@ -766,9 +910,10 @@ up and running.
 3.2.1 Using DHCP with Initscripts
 ---------------------------------
 
-	Recent versions of initscripts (the version supplied with
-Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do
-have support for assigning IP information to bonding devices via DHCP.
+	Recent versions of initscripts (the versions supplied with Fedora
+Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to
+work) have support for assigning IP information to bonding devices via
+DHCP.
 
 	To configure bonding for DHCP, configure it as described
 above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp"
@@ -778,18 +923,14 @@ is case sensitive.
 3.2.2 Configuring Multiple Bonds with Initscripts
 -------------------------------------------------
 
-	At this writing, the initscripts package does not directly
-support loading the bonding driver multiple times, so the process for
-doing so is the same as described in the "Configuring Multiple Bonds
-Manually" section, below.
-
-	NOTE: It has been observed that some Red Hat supplied kernels
-are apparently unable to rename modules at load time (the "-o bond1"
-part).  Attempts to pass that option to modprobe will produce an
-"Operation not permitted" error.  This has been reported on some
-Fedora Core kernels, and has been seen on RHEL 4 as well.  On kernels
-exhibiting this problem, it will be impossible to configure multiple
-bonds with differing parameters.
+	Initscripts packages that are included with Fedora 7 and Red Hat
+Enterprise Linux 5 support multiple bonding interfaces by simply
+specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the
+number of the bond.  This support requires sysfs support in the kernel,
+and a bonding driver of version 3.0.0 or later.  Other configurations may
+not support this method for specifying multiple bonding interfaces; for
+those instances, see the "Configuring Multiple Bonds Manually" section,
+below.
 
 3.3 Configuring Bonding Manually with Ifenslave
 -----------------------------------------------
@@ -860,20 +1001,24 @@ initialization scripts lack support for configuring multiple bonds.
 options, you may wish to use the "max_bonds" module parameter,
 documented above.
 
-	To create multiple bonding devices with differing options, it
-is necessary to load the bonding driver multiple times.  Note that
-current versions of the sysconfig network initialization scripts
-handle this automatically; if your distro uses these scripts, no
-special action is needed.  See the section Configuring Bonding
-Devices, above, if you're not sure about your network initialization
-scripts.
+	To create multiple bonding devices with differing options, it is
+preferrable to use bonding parameters exported by sysfs, documented in the
+section below.
+
+	For versions of bonding without sysfs support, the only means to
+provide multiple instances of bonding with differing options is to load
+the bonding driver multiple times.  Note that current versions of the
+sysconfig network initialization scripts handle this automatically; if
+your distro uses these scripts, no special action is needed.  See the
+section Configuring Bonding Devices, above, if you're not sure about your
+network initialization scripts.
 
 	To load multiple instances of the module, it is necessary to
 specify a different name for each instance (the module loading system
 requires that every loaded module, even multiple instances of the same
-module, have a unique name).  This is accomplished by supplying
-multiple sets of bonding options in /etc/modprobe.conf, for example:
-	
+module, have a unique name).  This is accomplished by supplying multiple
+sets of bonding options in /etc/modprobe.conf, for example:
+
 alias bond0 bonding
 options bond0 -o bond0 mode=balance-rr miimon=100
 
@@ -896,10 +1041,18 @@ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
 	This may be repeated any number of times, specifying a new and
 unique name in place of bond1 for each subsequent instance.
 
+	It has been observed that some Red Hat supplied kernels are unable
+to rename modules at load time (the "-o bond1" part).  Attempts to pass
+that option to modprobe will produce an "Operation not permitted" error.
+This has been reported on some Fedora Core kernels, and has been seen on
+RHEL 4 as well.  On kernels exhibiting this problem, it will be impossible
+to configure multiple bonds with differing parameters (as they are older
+kernels, and also lack sysfs support).
+
 3.4 Configuring Bonding Manually via Sysfs
 ------------------------------------------
 
-	Starting with version 3.0, Channel Bonding may be configured
+	Starting with version 3.0.0, Channel Bonding may be configured
 via the sysfs interface.  This interface allows dynamic configuration
 of all bonds in the system without unloading the module.  It also
 allows for adding and removing bonds at runtime.  Ifenslave is no
@@ -944,9 +1097,6 @@ To enslave interface eth0 to bond bond0:
 To free slave eth0 from bond bond0:
 # echo -eth0 > /sys/class/net/bond0/bonding/slaves
 
-	NOTE: The bond must be up before slaves can be added.  All
-slaves are freed when the interface is brought down.
-
 	When an interface is enslaved to a bond, symlinks between the
 two are created in the sysfs filesystem.  In this case, you would get
 /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and
@@ -964,7 +1114,7 @@ Changing a Bond's Configuration
 files located in /sys/class/net/<bond name>/bonding
 
 	The names of these files correspond directly with the command-
-line parameters described elsewhere in in this file, and, with the
+line parameters described elsewhere in this file, and, with the
 exception of arp_ip_target, they accept the same values.  To see the
 current setting, simply cat the appropriate file.
 
@@ -1536,6 +1686,15 @@ one for each switch in the network).  This will insure that,
 regardless of which switch is active, the ARP monitor has a suitable
 target to query.
 
+	Note, also, that of late many switches now support a functionality
+generally referred to as "trunk failover."  This is a feature of the
+switch that causes the link state of a particular switch port to be set
+down (or up) when the state of another switch port goes down (or up).
+It's purpose is to propogate link failures from logically "exterior" ports
+to the logically "interior" ports that bonding is able to monitor via
+miimon.  Availability and configuration for trunk failover varies by
+switch, but this can be a viable alternative to the ARP monitor when using
+suitable switches.
 
 12. Configuring Bonding for Maximum Throughput
 ==============================================
@@ -1623,7 +1782,7 @@ balance-rr: This mode is the only mode that will permit a single
 	interfaces. It is therefore the only mode that will allow a
 	single TCP/IP stream to utilize more than one interface's
 	worth of throughput.  This comes at a cost, however: the
-	striping often results in peer systems receiving packets out
+	striping generally results in peer systems receiving packets out
 	of order, causing TCP/IP's congestion control system to kick
 	in, often by retransmitting segments.
 
@@ -1635,22 +1794,20 @@ balance-rr: This mode is the only mode that will permit a single
 	interface's worth of throughput, even after adjusting
 	tcp_reordering.
 
-	Note that this out of order delivery occurs when both the
-	sending and receiving systems are utilizing a multiple
-	interface bond.  Consider a configuration in which a
-	balance-rr bond feeds into a single higher capacity network
-	channel (e.g., multiple 100Mb/sec ethernets feeding a single
-	gigabit ethernet via an etherchannel capable switch).  In this
-	configuration, traffic sent from the multiple 100Mb devices to
-	a destination connected to the gigabit device will not see
-	packets out of order.  However, traffic sent from the gigabit
-	device to the multiple 100Mb devices may or may not see
-	traffic out of order, depending upon the balance policy of the
-	switch.  Many switches do not support any modes that stripe
-	traffic (instead choosing a port based upon IP or MAC level
-	addresses); for those devices, traffic flowing from the
-	gigabit device to the many 100Mb devices will only utilize one
-	interface.
+	Note that the fraction of packets that will be delivered out of
+	order is highly variable, and is unlikely to be zero.  The level
+	of reordering depends upon a variety of factors, including the
+	networking interfaces, the switch, and the topology of the
+	configuration.  Speaking in general terms, higher speed network
+	cards produce more reordering (due to factors such as packet
+	coalescing), and a "many to many" topology will reorder at a
+	higher rate than a "many slow to one fast" configuration.
+
+	Many switches do not support any modes that stripe traffic
+	(instead choosing a port based upon IP or MAC level addresses);
+	for those devices, traffic for a particular connection flowing
+	through the switch to a balance-rr bond will not utilize greater
+	than one interface's worth of bandwidth.
 
 	If you are utilizing protocols other than TCP/IP, UDP for
 	example, and your application can tolerate out of order
@@ -1850,6 +2007,10 @@ Failover may be delayed via the downdelay bonding module option.
 13.2 Duplicated Incoming Packets
 --------------------------------
 
+	NOTE: Starting with version 3.0.2, the bonding driver has logic to
+suppress duplicate packets, which should largely eliminate this problem.
+The following description is kept for reference.
+
 	It is not uncommon to observe a short burst of duplicated
 traffic when the bonding device is first used, or after it has been
 idle for some period of time.  This is most easily observed by issuing
@@ -2010,6 +2171,9 @@ The new driver was designed to be SMP safe from the start.
 EtherExpress PRO/100 and a 3com 3c905b, for example).  For most modes,
 devices need not be of the same speed.
 
+	Starting with version 3.2.1, bonding also supports Infiniband
+slaves in active-backup mode.
+
 3.  How many bonding devices can I have?
 
 	There is no limit.
@@ -2068,11 +2232,15 @@ switches currently available support 802.3ad.
 
 8.  Where does a bonding device get its MAC address from?
 
-	If not explicitly configured (with ifconfig or ip link), the
-MAC address of the bonding device is taken from its first slave
-device.  This MAC address is then passed to all following slaves and
-remains persistent (even if the first slave is removed) until the
-bonding device is brought down or reconfigured.
+	When using slave devices that have fixed MAC addresses, or when
+the fail_over_mac option is enabled, the bonding device's MAC address is
+the MAC address of the active slave.
+
+	For other configurations, if not explicitly configured (with
+ifconfig or ip link), the MAC address of the bonding device is taken from
+its first slave device.  This MAC address is then passed to all following
+slaves and remains persistent (even if the first slave is removed) until
+the bonding device is brought down or reconfigured.
 
 	If you wish to change the MAC address, you can set it with
 ifconfig or ip link: