diff options
Diffstat (limited to 'Documentation/networking/bonding.txt')
| -rw-r--r-- | Documentation/networking/bonding.txt | 289 |
1 files changed, 197 insertions, 92 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index bfea8a33890..9c723ecd002 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -104,8 +104,7 @@ Table of Contents ============================== Most popular distro kernels ship with the bonding driver -already available as a module and the ifenslave user level control -program installed and ready for use. If your distro does not, or you +already available as a module. If your distro does not, or you have need to compile bonding from source (e.g., configuring and installing a mainline kernel from kernel.org), you'll need to perform the following steps: @@ -124,46 +123,13 @@ device support" section. It is recommended that you configure the driver as module since it is currently the only way to pass parameters to the driver or configure more than one bonding device. - Build and install the new kernel and modules, then continue -below to install ifenslave. + Build and install the new kernel and modules. -1.2 Install ifenslave Control Utility +1.2 Bonding Control Utility ------------------------------------- - The ifenslave user level control program is included in the -kernel source tree, in the file Documentation/networking/ifenslave.c. -It is generally recommended that you use the ifenslave that -corresponds to the kernel that you are using (either from the same -source tree or supplied with the distro), however, ifenslave -executables from older kernels should function (but features newer -than the ifenslave release are not supported). Running an ifenslave -that is newer than the kernel is not supported, and may or may not -work. - - To install ifenslave, do the following: - -# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave -# cp ifenslave /sbin/ifenslave - - If your kernel source is not in "/usr/src/linux," then replace -"/usr/src/linux/include" in the above with the location of your kernel -source include directory. - - You may wish to back up any existing /sbin/ifenslave, or, for -testing or informal use, tag the ifenslave to the kernel version -(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). - -IMPORTANT NOTE: - - If you omit the "-I" or specify an incorrect directory, you -may end up with an ifenslave that is incompatible with the kernel -you're trying to build it for. Some distros (e.g., Red Hat from 7.1 -onwards) do not have /usr/include/linux symbolically linked to the -default kernel source include directory. - -SECOND IMPORTANT NOTE: - If you plan to configure bonding using sysfs or using the -/etc/network/interfaces file, you do not need to use ifenslave. + It is recommended to configure bonding via iproute2 (netlink) +or sysfs, the old ifenslave control utility is obsolete. 2. Bonding Driver Options ========================= @@ -304,16 +270,15 @@ arp_ip_target arp_validate Specifies whether or not ARP probes and replies should be - validated in the active-backup mode. This causes the ARP - monitor to examine the incoming ARP requests and replies, and - only consider a slave to be up if it is receiving the - appropriate ARP traffic. + validated in any mode that supports arp monitoring, or whether + non-ARP traffic should be filtered (disregarded) for link + monitoring purposes. Possible values are: none or 0 - No validation is performed. This is the default. + No validation or filtering is performed. active or 1 @@ -327,28 +292,90 @@ arp_validate Validation is performed for all slaves. - For the active slave, the validation checks ARP replies to - confirm that they were generated by an arp_ip_target. Since - backup slaves do not typically receive these replies, the - validation performed for backup slaves is on the ARP request - sent out via the active slave. It is possible that some - switch or network configurations may result in situations - wherein the backup slaves do not receive the ARP requests; in - such a situation, validation of backup slaves must be - disabled. - - This option is useful in network configurations in which - multiple bonding hosts are concurrently issuing ARPs to one or - more targets beyond a common switch. Should the link between - the switch and target fail (but not the switch itself), the - probe traffic generated by the multiple bonding instances will - fool the standard ARP monitor into considering the links as - still up. Use of the arp_validate option can resolve this, as - the ARP monitor will only consider ARP requests and replies - associated with its own instance of bonding. + filter or 4 + + Filtering is applied to all slaves. No validation is + performed. + + filter_active or 5 + + Filtering is applied to all slaves, validation is performed + only for the active slave. + + filter_backup or 6 + + Filtering is applied to all slaves, validation is performed + only for backup slaves. + + Validation: + + Enabling validation causes the ARP monitor to examine the incoming + ARP requests and replies, and only consider a slave to be up if it + is receiving the appropriate ARP traffic. + + For an active slave, the validation checks ARP replies to confirm + that they were generated by an arp_ip_target. Since backup slaves + do not typically receive these replies, the validation performed + for backup slaves is on the broadcast ARP request sent out via the + active slave. It is possible that some switch or network + configurations may result in situations wherein the backup slaves + do not receive the ARP requests; in such a situation, validation + of backup slaves must be disabled. + + The validation of ARP requests on backup slaves is mainly helping + bonding to decide which slaves are more likely to work in case of + the active slave failure, it doesn't really guarantee that the + backup slave will work if it's selected as the next active slave. + + Validation is useful in network configurations in which multiple + bonding hosts are concurrently issuing ARPs to one or more targets + beyond a common switch. Should the link between the switch and + target fail (but not the switch itself), the probe traffic + generated by the multiple bonding instances will fool the standard + ARP monitor into considering the links as still up. Use of + validation can resolve this, as the ARP monitor will only consider + ARP requests and replies associated with its own instance of + bonding. + + Filtering: + + Enabling filtering causes the ARP monitor to only use incoming ARP + packets for link availability purposes. Arriving packets that are + not ARPs are delivered normally, but do not count when determining + if a slave is available. + + Filtering operates by only considering the reception of ARP + packets (any ARP packet, regardless of source or destination) when + determining if a slave has received traffic for link availability + purposes. + + Filtering is useful in network configurations in which significant + levels of third party broadcast traffic would fool the standard + ARP monitor into considering the links as still up. Use of + filtering can resolve this, as only ARP traffic is considered for + link availability purposes. This option was added in bonding version 3.1.0. +arp_all_targets + + Specifies the quantity of arp_ip_targets that must be reachable + in order for the ARP monitor to consider a slave as being up. + This option affects only active-backup mode for slaves with + arp_validation enabled. + + Possible values are: + + any or 0 + + consider the slave up only when any of the arp_ip_targets + is reachable + + all or 1 + + consider the slave up only when all of the arp_ip_targets + are reachable + downdelay Specifies the time, in milliseconds, to wait before disabling @@ -558,13 +585,19 @@ mode balance-tlb or 5 Adaptive transmit load balancing: channel bonding that - does not require any special switch support. The - outgoing traffic is distributed according to the - current load (computed relative to the speed) on each - slave. Incoming traffic is received by the current - slave. If the receiving slave fails, another slave - takes over the MAC address of the failed receiving - slave. + does not require any special switch support. + + In tlb_dynamic_lb=1 mode; the outgoing traffic is + distributed according to the current load (computed + relative to the speed) on each slave. + + In tlb_dynamic_lb=0 mode; the load balancing based on + current load is disabled and the load is distributed + only using the hash distribution. + + Incoming traffic is received by the current slave. + If the receiving slave fails, another slave takes over + the MAC address of the failed receiving slave. Prerequisite: @@ -648,6 +681,15 @@ num_unsol_na are generated by the ipv4 and ipv6 code and the numbers of repetitions cannot be set independently. +packets_per_slave + + Specify the number of packets to transmit through a slave before + moving to the next one. When set to 0 then a slave is chosen at + random. + + The valid range is 0 - 65535; the default value is 1. This option + has effect only in balance-rr mode. + primary A string (eth0, eth2, etc) specifying which slave is the @@ -657,7 +699,8 @@ primary one slave is preferred over another, e.g., when one slave has higher throughput than another. - The primary option is only valid for active-backup mode. + The primary option is only valid for active-backup(1), + balance-tlb (5) and balance-alb (6) mode. primary_reselect @@ -699,6 +742,28 @@ primary_reselect This option was added for bonding version 3.6.0. +tlb_dynamic_lb + + Specifies if dynamic shuffling of flows is enabled in tlb + mode. The value has no effect on any other modes. + + The default behavior of tlb mode is to shuffle active flows across + slaves based on the load in that interval. This gives nice lb + characteristics but can cause packet reordering. If re-ordering is + a concern use this variable to disable flow shuffling and rely on + load balancing provided solely by the hash distribution. + xmit-hash-policy can be used to select the appropriate hashing for + the setup. + + The sysfs entry can be used to change the setting per bond device + and the initial value is derived from the module parameter. The + sysfs entry is allowed to be changed only if the bond device is + down. + + The default value is "1" that enables flow shuffling while value "0" + disables it. This option was added in bonding driver 3.7.1 + + updelay Specifies the time, in milliseconds, to wait before enabling a @@ -732,7 +797,7 @@ use_carrier xmit_hash_policy Selects the transmit hash policy to use for slave selection in - balance-xor and 802.3ad modes. Possible values are: + balance-xor, 802.3ad, and tlb modes. Possible values are: layer2 @@ -754,9 +819,14 @@ xmit_hash_policy Uses XOR of hardware MAC addresses and IP addresses to generate the hash. The formula is - (((source IP XOR dest IP) AND 0xffff) XOR - ( source MAC XOR destination MAC )) - modulo slave count + hash = source MAC XOR destination MAC + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. + + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, @@ -780,20 +850,21 @@ xmit_hash_policy The formula for unfragmented TCP and UDP packets is - ((source port XOR dest port) XOR - ((source IP XOR dest IP) AND 0xffff) - modulo slave count + hash = source port, destination port (as in the header) + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. + + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of - certain switches, notably Cisco switches with PFC2 as - well as some Foundry and IBM products. - This algorithm is not fully 802.3ad compliant. A single TCP or UDP conversation containing both fragmented and unfragmented packets will see packets @@ -804,6 +875,26 @@ xmit_hash_policy conversations. Other implementations of 802.3ad may or may not tolerate this noncompliance. + encap2+3 + + This policy uses the same formula as layer2+3 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + + encap3+4 + + This policy uses the same formula as layer3+4 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + The default value is layer2. This option was added in bonding version 2.6.3. In earlier versions of bonding, this parameter does not exist, and the layer2 policy is the only policy. The @@ -827,11 +918,19 @@ resend_igmp This option was added for bonding version 3.7.0. +lp_interval + + Specifies the number of seconds between instances where the bonding + driver sends learning packets to each slaves peer switch. + + The valid range is 1 - 0x7fffffff; the default value is 1. This Option + has effect only in balance-tlb and balance-alb modes. + 3. Configuring Bonding Devices ============================== You can configure bonding using either your distro's network -initialization scripts, or manually using either ifenslave or the +initialization scripts, or manually using either iproute2 or the sysfs interface. Distros generally use one of three packages for the network initialization scripts: initscripts, sysconfig or interfaces. Recent versions of these packages have support for bonding, while older @@ -1140,7 +1239,7 @@ not support this method for specifying multiple bonding interfaces; for those instances, see the "Configuring Multiple Bonds Manually" section, below. -3.3 Configuring Bonding Manually with Ifenslave +3.3 Configuring Bonding Manually with iproute2 ----------------------------------------------- This section applies to distros whose network initialization @@ -1151,7 +1250,7 @@ version 8. The general method for these systems is to place the bonding module parameters into a config file in /etc/modprobe.d/ (as appropriate for the installed distro), then add modprobe and/or -ifenslave commands to the system's global init script. The name of +`ip link` commands to the system's global init script. The name of the global init script differs; for sysconfig, it is /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. @@ -1163,8 +1262,8 @@ reboots, edit the appropriate file (/etc/init.d/boot.local or modprobe bonding mode=balance-alb miimon=100 modprobe e100 ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up -ifenslave bond0 eth0 -ifenslave bond0 eth1 +ip link set eth0 master bond0 +ip link set eth1 master bond0 Replace the example bonding module parameters and bond0 network configuration (IP address, netmask, etc) with the appropriate @@ -1210,7 +1309,7 @@ options, you may wish to use the "max_bonds" module parameter, documented above. To create multiple bonding devices with differing options, it is -preferrable to use bonding parameters exported by sysfs, documented in the +preferable to use bonding parameters exported by sysfs, documented in the section below. For versions of bonding without sysfs support, the only means to @@ -1351,6 +1450,12 @@ To add ARP targets: To remove an ARP target: # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target +To configure the interval between learning packet transmits: +# echo 12 > /sys/class/net/bond0/bonding/lp_interval + NOTE: the lp_inteval is the number of seconds between instances where +the bonding driver sends learning packets to each slaves peer switch. The +default interval is 1 second. + Example Configuration --------------------- We begin with the same example that is shown in section 3.3, @@ -1950,7 +2055,7 @@ access to fail over to. Additionally, the bonding load balance modes support link monitoring of their members, so if individual links fail, the load will be rebalanced across the remaining devices. - See Section 13, "Configuring Bonding for Maximum Throughput" + See Section 12, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. 11.2 High Availability in a Multiple Switch Topology @@ -2620,7 +2725,7 @@ be found at: https://lists.sourceforge.net/lists/listinfo/bonding-devel - Discussions regarding the developpement of the bonding driver take place + Discussions regarding the development of the bonding driver take place on the main Linux network mailing list, hosted at vger.kernel.org. The list address is: |
