diff options
Diffstat (limited to 'Documentation/networking/bonding.txt')
| -rw-r--r-- | Documentation/networking/bonding.txt | 959 |
1 files changed, 788 insertions, 171 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index afac780445c..9c723ecd002 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -1,7 +1,7 @@ Linux Ethernet Bonding Driver HOWTO - Latest update: 24 April 2006 + Latest update: 27 April 2011 Initial release : Thomas Davis <tadavis at lbl.gov> Corrections, HA extensions : 2000/10/03-15 : @@ -49,6 +49,8 @@ Table of Contents 3.3 Configuring Bonding Manually with Ifenslave 3.3.1 Configuring Multiple Bonds Manually 3.4 Configuring Bonding Manually via Sysfs +3.5 Configuration with Interfaces Support +3.6 Overriding Configuration for Special Cases 4. Querying Bonding Configuration 4.1 Bonding Configuration @@ -102,8 +104,7 @@ Table of Contents ============================== Most popular distro kernels ship with the bonding driver -already available as a module and the ifenslave user level control -program installed and ready for use. If your distro does not, or you +already available as a module. If your distro does not, or you have need to compile bonding from source (e.g., configuring and installing a mainline kernel from kernel.org), you'll need to perform the following steps: @@ -122,56 +123,27 @@ device support" section. It is recommended that you configure the driver as module since it is currently the only way to pass parameters to the driver or configure more than one bonding device. - Build and install the new kernel and modules, then continue -below to install ifenslave. + Build and install the new kernel and modules. -1.2 Install ifenslave Control Utility +1.2 Bonding Control Utility ------------------------------------- - The ifenslave user level control program is included in the -kernel source tree, in the file Documentation/networking/ifenslave.c. -It is generally recommended that you use the ifenslave that -corresponds to the kernel that you are using (either from the same -source tree or supplied with the distro), however, ifenslave -executables from older kernels should function (but features newer -than the ifenslave release are not supported). Running an ifenslave -that is newer than the kernel is not supported, and may or may not -work. - - To install ifenslave, do the following: - -# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave -# cp ifenslave /sbin/ifenslave - - If your kernel source is not in "/usr/src/linux," then replace -"/usr/src/linux/include" in the above with the location of your kernel -source include directory. - - You may wish to back up any existing /sbin/ifenslave, or, for -testing or informal use, tag the ifenslave to the kernel version -(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). - -IMPORTANT NOTE: - - If you omit the "-I" or specify an incorrect directory, you -may end up with an ifenslave that is incompatible with the kernel -you're trying to build it for. Some distros (e.g., Red Hat from 7.1 -onwards) do not have /usr/include/linux symbolically linked to the -default kernel source include directory. - -SECOND IMPORTANT NOTE: - If you plan to configure bonding using sysfs, you do not need -to use ifenslave. + It is recommended to configure bonding via iproute2 (netlink) +or sysfs, the old ifenslave control utility is obsolete. 2. Bonding Driver Options ========================= - Options for the bonding driver are supplied as parameters to -the bonding module at load time. They may be given as command line -arguments to the insmod or modprobe command, but are usually specified -in either the /etc/modules.conf or /etc/modprobe.conf configuration -file, or in a distro-specific configuration file (some of which are -detailed in the next section). + Options for the bonding driver are supplied as parameters to the +bonding module at load time, or are specified via sysfs. + + Module options may be given as command line arguments to the +insmod or modprobe command, but are usually specified in either the +/etc/modrobe.d/*.conf configuration files, or in a distro-specific +configuration file (some of which are detailed in the next section). + + Details on bonding support for sysfs is provided in the +"Configuring Bonding Manually via Sysfs" section, below. The available bonding driver parameters are listed below. If a parameter is not specified the default value is used. When initially @@ -189,9 +161,91 @@ or, for backwards compatibility, the option value. E.g., The parameters are as follows: +active_slave + + Specifies the new active slave for modes that support it + (active-backup, balance-alb and balance-tlb). Possible values + are the name of any currently enslaved interface, or an empty + string. If a name is given, the slave and its link must be up in order + to be selected as the new active slave. If an empty string is + specified, the current active slave is cleared, and a new active + slave is selected automatically. + + Note that this is only available through the sysfs interface. No module + parameter by this name exists. + + The normal value of this option is the name of the currently + active slave, or the empty string if there is no active slave or + the current mode does not use an active slave. + +ad_select + + Specifies the 802.3ad aggregation selection logic to use. The + possible values and their effects are: + + stable or 0 + + The active aggregator is chosen by largest aggregate + bandwidth. + + Reselection of the active aggregator occurs only when all + slaves of the active aggregator are down or the active + aggregator has no slaves. + + This is the default value. + + bandwidth or 1 + + The active aggregator is chosen by largest aggregate + bandwidth. Reselection occurs if: + + - A slave is added to or removed from the bond + + - Any slave's link state changes + + - Any slave's 802.3ad association state changes + + - The bond's administrative state changes to up + + count or 2 + + The active aggregator is chosen by the largest number of + ports (slaves). Reselection occurs as described under the + "bandwidth" setting, above. + + The bandwidth and count selection policies permit failover of + 802.3ad aggregations when partial failure of the active aggregator + occurs. This keeps the aggregator with the highest availability + (either in bandwidth or in number of ports) active at all times. + + This option was added in bonding version 3.4.0. + +all_slaves_active + + Specifies that duplicate frames (received on inactive ports) should be + dropped (0) or delivered (1). + + Normally, bonding will drop duplicate frames (received on inactive + ports), which is desirable for most users. But there are some times + it is nice to allow duplicate frames to be delivered. + + The default value is 0 (drop duplicate frames received on inactive + ports). + arp_interval Specifies the ARP link monitoring frequency in milliseconds. + + The ARP monitor works by periodically checking the slave + devices to determine whether they have sent or received + traffic recently (the precise criteria depends upon the + bonding mode, and the state of the slave). Regular traffic is + generated via ARP probes issued for the addresses specified by + the arp_ip_target option. + + This behavior can be modified by the arp_validate option, + below. + If ARP monitoring is used in an etherchannel compatible mode (modes 0 and 2), the switch should be configured in a mode that evenly distributes packets across all links. If the @@ -213,6 +267,115 @@ arp_ip_target maximum number of targets that can be specified is 16. The default value is no IP addresses. +arp_validate + + Specifies whether or not ARP probes and replies should be + validated in any mode that supports arp monitoring, or whether + non-ARP traffic should be filtered (disregarded) for link + monitoring purposes. + + Possible values are: + + none or 0 + + No validation or filtering is performed. + + active or 1 + + Validation is performed only for the active slave. + + backup or 2 + + Validation is performed only for backup slaves. + + all or 3 + + Validation is performed for all slaves. + + filter or 4 + + Filtering is applied to all slaves. No validation is + performed. + + filter_active or 5 + + Filtering is applied to all slaves, validation is performed + only for the active slave. + + filter_backup or 6 + + Filtering is applied to all slaves, validation is performed + only for backup slaves. + + Validation: + + Enabling validation causes the ARP monitor to examine the incoming + ARP requests and replies, and only consider a slave to be up if it + is receiving the appropriate ARP traffic. + + For an active slave, the validation checks ARP replies to confirm + that they were generated by an arp_ip_target. Since backup slaves + do not typically receive these replies, the validation performed + for backup slaves is on the broadcast ARP request sent out via the + active slave. It is possible that some switch or network + configurations may result in situations wherein the backup slaves + do not receive the ARP requests; in such a situation, validation + of backup slaves must be disabled. + + The validation of ARP requests on backup slaves is mainly helping + bonding to decide which slaves are more likely to work in case of + the active slave failure, it doesn't really guarantee that the + backup slave will work if it's selected as the next active slave. + + Validation is useful in network configurations in which multiple + bonding hosts are concurrently issuing ARPs to one or more targets + beyond a common switch. Should the link between the switch and + target fail (but not the switch itself), the probe traffic + generated by the multiple bonding instances will fool the standard + ARP monitor into considering the links as still up. Use of + validation can resolve this, as the ARP monitor will only consider + ARP requests and replies associated with its own instance of + bonding. + + Filtering: + + Enabling filtering causes the ARP monitor to only use incoming ARP + packets for link availability purposes. Arriving packets that are + not ARPs are delivered normally, but do not count when determining + if a slave is available. + + Filtering operates by only considering the reception of ARP + packets (any ARP packet, regardless of source or destination) when + determining if a slave has received traffic for link availability + purposes. + + Filtering is useful in network configurations in which significant + levels of third party broadcast traffic would fool the standard + ARP monitor into considering the links as still up. Use of + filtering can resolve this, as only ARP traffic is considered for + link availability purposes. + + This option was added in bonding version 3.1.0. + +arp_all_targets + + Specifies the quantity of arp_ip_targets that must be reachable + in order for the ARP monitor to consider a slave as being up. + This option affects only active-backup mode for slaves with + arp_validation enabled. + + Possible values are: + + any or 0 + + consider the slave up only when any of the arp_ip_targets + is reachable + + all or 1 + + consider the slave up only when all of the arp_ip_targets + are reachable + downdelay Specifies the time, in milliseconds, to wait before disabling @@ -222,6 +385,77 @@ downdelay will be rounded down to the nearest multiple. The default value is 0. +fail_over_mac + + Specifies whether active-backup mode should set all slaves to + the same MAC address at enslavement (the traditional + behavior), or, when enabled, perform special handling of the + bond's MAC address in accordance with the selected policy. + + Possible values are: + + none or 0 + + This setting disables fail_over_mac, and causes + bonding to set all slaves of an active-backup bond to + the same MAC address at enslavement time. This is the + default. + + active or 1 + + The "active" fail_over_mac policy indicates that the + MAC address of the bond should always be the MAC + address of the currently active slave. The MAC + address of the slaves is not changed; instead, the MAC + address of the bond changes during a failover. + + This policy is useful for devices that cannot ever + alter their MAC address, or for devices that refuse + incoming broadcasts with their own source MAC (which + interferes with the ARP monitor). + + The down side of this policy is that every device on + the network must be updated via gratuitous ARP, + vs. just updating a switch or set of switches (which + often takes place for any traffic, not just ARP + traffic, if the switch snoops incoming traffic to + update its tables) for the traditional method. If the + gratuitous ARP is lost, communication may be + disrupted. + + When this policy is used in conjunction with the mii + monitor, devices which assert link up prior to being + able to actually transmit and receive are particularly + susceptible to loss of the gratuitous ARP, and an + appropriate updelay setting may be required. + + follow or 2 + + The "follow" fail_over_mac policy causes the MAC + address of the bond to be selected normally (normally + the MAC address of the first slave added to the bond). + However, the second and subsequent slaves are not set + to this MAC address while they are in a backup role; a + slave is programmed with the bond's MAC address at + failover time (and the formerly active slave receives + the newly active slave's MAC address). + + This policy is useful for multiport devices that + either become confused or incur a performance penalty + when multiple ports are programmed with the same MAC + address. + + + The default policy is none, unless the first slave cannot + change its MAC address, in which case the active policy is + selected by default. + + This option may be modified via sysfs only when no slaves are + present in the bond. + + This option was added in bonding version 3.2.0. The "follow" + policy was added in bonding version 3.3.0. + lacp_rate Option specifying the rate in which we'll ask our link partner @@ -241,7 +475,8 @@ max_bonds Specifies the number of bonding devices to create for this instance of the bonding driver. E.g., if max_bonds is 3, and the bonding driver is not already loaded, then bond0, bond1 - and bond2 will be created. The default value is 1. + and bond2 will be created. The default value is 1. Specifying + a value of 0 will load bonding, but will not create any devices. miimon @@ -253,6 +488,23 @@ miimon determined. See the High Availability section for additional information. The default value is 0. +min_links + + Specifies the minimum number of links that must be active before + asserting carrier. It is similar to the Cisco EtherChannel min-links + feature. This allows setting the minimum number of member ports that + must be up (link-up state) before marking the bond device as up + (carrier on). This is useful for situations where higher level services + such as clustering want to ensure a minimum number of low bandwidth + links are active before switchover. This option only affect 802.3ad + mode. + + The default value is 0. This will cause carrier to be asserted (for + 802.3ad mode) whenever there is an active aggregator, regardless of the + number of available links in that aggregator. Note that, because an + aggregator cannot be active without at least one available link, + setting this option to 0 or to 1 has the exact same effect. + mode Specifies one of the bonding policies. The default is @@ -333,13 +585,19 @@ mode balance-tlb or 5 Adaptive transmit load balancing: channel bonding that - does not require any special switch support. The - outgoing traffic is distributed according to the - current load (computed relative to the speed) on each - slave. Incoming traffic is received by the current - slave. If the receiving slave fails, another slave - takes over the MAC address of the failed receiving - slave. + does not require any special switch support. + + In tlb_dynamic_lb=1 mode; the outgoing traffic is + distributed according to the current load (computed + relative to the speed) on each slave. + + In tlb_dynamic_lb=0 mode; the load balancing based on + current load is disabled and the load is distributed + only using the hash distribution. + + Incoming traffic is received by the current slave. + If the receiving slave fails, another slave takes over + the MAC address of the failed receiving slave. Prerequisite: @@ -404,6 +662,34 @@ mode swapped with the new curr_active_slave that was chosen. +num_grat_arp +num_unsol_na + + Specify the number of peer notifications (gratuitous ARPs and + unsolicited IPv6 Neighbor Advertisements) to be issued after a + failover event. As soon as the link is up on the new slave + (possibly immediately) a peer notification is sent on the + bonding device and each VLAN sub-device. This is repeated at + each link monitor interval (arp_interval or miimon, whichever + is active) if the number is greater than 1. + + The valid range is 0 - 255; the default value is 1. These options + affect only the active-backup mode. These options were added for + bonding versions 3.3.0 and 3.4.0 respectively. + + From Linux 3.0 and bonding version 3.7.1, these notifications + are generated by the ipv4 and ipv6 code and the numbers of + repetitions cannot be set independently. + +packets_per_slave + + Specify the number of packets to transmit through a slave before + moving to the next one. When set to 0 then a slave is chosen at + random. + + The valid range is 0 - 65535; the default value is 1. This option + has effect only in balance-rr mode. + primary A string (eth0, eth2, etc) specifying which slave is the @@ -413,7 +699,70 @@ primary one slave is preferred over another, e.g., when one slave has higher throughput than another. - The primary option is only valid for active-backup mode. + The primary option is only valid for active-backup(1), + balance-tlb (5) and balance-alb (6) mode. + +primary_reselect + + Specifies the reselection policy for the primary slave. This + affects how the primary slave is chosen to become the active slave + when failure of the active slave or recovery of the primary slave + occurs. This option is designed to prevent flip-flopping between + the primary slave and other slaves. Possible values are: + + always or 0 (default) + + The primary slave becomes the active slave whenever it + comes back up. + + better or 1 + + The primary slave becomes the active slave when it comes + back up, if the speed and duplex of the primary slave is + better than the speed and duplex of the current active + slave. + + failure or 2 + + The primary slave becomes the active slave only if the + current active slave fails and the primary slave is up. + + The primary_reselect setting is ignored in two cases: + + If no slaves are active, the first slave to recover is + made the active slave. + + When initially enslaved, the primary slave is always made + the active slave. + + Changing the primary_reselect policy via sysfs will cause an + immediate selection of the best active slave according to the new + policy. This may or may not result in a change of the active + slave, depending upon the circumstances. + + This option was added for bonding version 3.6.0. + +tlb_dynamic_lb + + Specifies if dynamic shuffling of flows is enabled in tlb + mode. The value has no effect on any other modes. + + The default behavior of tlb mode is to shuffle active flows across + slaves based on the load in that interval. This gives nice lb + characteristics but can cause packet reordering. If re-ordering is + a concern use this variable to disable flow shuffling and rely on + load balancing provided solely by the hash distribution. + xmit-hash-policy can be used to select the appropriate hashing for + the setup. + + The sysfs entry can be used to change the setting per bond device + and the initial value is derived from the module parameter. The + sysfs entry is allowed to be changed only if the bond device is + down. + + The default value is "1" that enables flow shuffling while value "0" + disables it. This option was added in bonding driver 3.7.1 + updelay @@ -448,7 +797,7 @@ use_carrier xmit_hash_policy Selects the transmit hash policy to use for slave selection in - balance-xor and 802.3ad modes. Possible values are: + balance-xor, 802.3ad, and tlb modes. Possible values are: layer2 @@ -462,6 +811,35 @@ xmit_hash_policy This algorithm is 802.3ad compliant. + layer2+3 + + This policy uses a combination of layer2 and layer3 + protocol information to generate the hash. + + Uses XOR of hardware MAC addresses and IP addresses to + generate the hash. The formula is + + hash = source MAC XOR destination MAC + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. + + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. + + This algorithm will place all traffic to a particular + network peer on the same slave. For non-IP traffic, + the formula is the same as for the layer2 transmit + hash policy. + + This policy is intended to provide a more balanced + distribution of traffic than layer2 alone, especially + in environments where a layer3 gateway device is + required to reach most destinations. + + This algorithm is 802.3ad compliant. + layer3+4 This policy uses upper layer protocol information, @@ -472,20 +850,21 @@ xmit_hash_policy The formula for unfragmented TCP and UDP packets is - ((source port XOR dest port) XOR - ((source IP XOR dest IP) AND 0xffff) - modulo slave count + hash = source port, destination port (as in the header) + hash = hash XOR source IP XOR destination IP + hash = hash XOR (hash RSHIFT 16) + hash = hash XOR (hash RSHIFT 8) + And then hash is reduced modulo slave count. - For fragmented TCP or UDP packets and all other IP - protocol traffic, the source and destination port + If the protocol is IPv6 then the source and destination + addresses are first hashed using ipv6_addr_hash. + + For fragmented TCP or UDP packets and all other IPv4 and + IPv6 protocol traffic, the source and destination port information is omitted. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy. - This policy is intended to mimic the behavior of - certain switches, notably Cisco switches with PFC2 as - well as some Foundry and IBM products. - This algorithm is not fully 802.3ad compliant. A single TCP or UDP conversation containing both fragmented and unfragmented packets will see packets @@ -496,32 +875,82 @@ xmit_hash_policy conversations. Other implementations of 802.3ad may or may not tolerate this noncompliance. + encap2+3 + + This policy uses the same formula as layer2+3 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + + encap3+4 + + This policy uses the same formula as layer3+4 but it + relies on skb_flow_dissect to obtain the header fields + which might result in the use of inner headers if an + encapsulation protocol is used. For example this will + improve the performance for tunnel users because the + packets will be distributed according to the encapsulated + flows. + The default value is layer2. This option was added in bonding -version 2.6.3. In earlier versions of bonding, this parameter does -not exist, and the layer2 policy is the only policy. + version 2.6.3. In earlier versions of bonding, this parameter + does not exist, and the layer2 policy is the only policy. The + layer2+3 value was added for bonding version 3.2.2. + +resend_igmp + + Specifies the number of IGMP membership reports to be issued after + a failover event. One membership report is issued immediately after + the failover, subsequent packets are sent in each 200ms interval. + + The valid range is 0 - 255; the default value is 1. A value of 0 + prevents the IGMP membership report from being issued in response + to the failover event. + + This option is useful for bonding modes balance-rr (0), active-backup + (1), balance-tlb (5) and balance-alb (6), in which a failover can + switch the IGMP traffic from one slave to another. Therefore a fresh + IGMP report must be issued to cause the switch to forward the incoming + IGMP traffic over the newly selected slave. + + This option was added for bonding version 3.7.0. +lp_interval + + Specifies the number of seconds between instances where the bonding + driver sends learning packets to each slaves peer switch. + + The valid range is 1 - 0x7fffffff; the default value is 1. This Option + has effect only in balance-tlb and balance-alb modes. 3. Configuring Bonding Devices ============================== You can configure bonding using either your distro's network -initialization scripts, or manually using either ifenslave or the -sysfs interface. Distros generally use one of two packages for the -network initialization scripts: initscripts or sysconfig. Recent -versions of these packages have support for bonding, while older +initialization scripts, or manually using either iproute2 or the +sysfs interface. Distros generally use one of three packages for the +network initialization scripts: initscripts, sysconfig or interfaces. +Recent versions of these packages have support for bonding, while older versions do not. We will first describe the options for configuring bonding for -distros using versions of initscripts and sysconfig with full or -partial support for bonding, then provide information on enabling +distros using versions of initscripts, sysconfig and interfaces with full +or partial support for bonding, then provide information on enabling bonding without support from the network initialization scripts (i.e., older versions of initscripts or sysconfig). - If you're unsure whether your distro uses sysconfig or -initscripts, or don't know if it's new enough, have no fear. + If you're unsure whether your distro uses sysconfig, +initscripts or interfaces, or don't know if it's new enough, have no fear. Determining this is fairly straightforward. - First, issue the command: + First, look for a file called interfaces in /etc/network directory. +If this file is present in your system, then your system use interfaces. See +Configuration with Interfaces Support. + + Else, issue the command: $ rpm -qf /sbin/ifup @@ -690,16 +1119,18 @@ ifcfg-bondX files. Because the sysconfig scripts supply the bonding module options in the ifcfg-bondX file, it is not necessary to add them to -the system /etc/modules.conf or /etc/modprobe.conf configuration file. +the system /etc/modules.d/*.conf configuration files. 3.2 Configuration with Initscripts Support ------------------------------------------ - This section applies to distros using a version of initscripts -with bonding support, for example, Red Hat Linux 9 or Red Hat -Enterprise Linux version 3 or 4. On these systems, the network -initialization scripts have some knowledge of bonding, and can be -configured to control bonding devices. + This section applies to distros using a recent version of +initscripts with bonding support, for example, Red Hat Enterprise Linux +version 3 or later, Fedora, etc. On these systems, the network +initialization scripts have knowledge of bonding, and can be configured to +control bonding devices. Note that older versions of the initscripts +package have lower levels of support for bonding; this will be noted where +applicable. These distros will not automatically load the network adapter driver unless the ethX device is configured with an IP address. @@ -747,11 +1178,31 @@ USERCTL=no Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. - Finally, it is necessary to edit /etc/modules.conf (or -/etc/modprobe.conf, depending upon your distro) to load the bonding -module with your desired options when the bond0 interface is brought -up. The following lines in /etc/modules.conf (or modprobe.conf) will -load the bonding module, and select its options: + For later versions of initscripts, such as that found with Fedora +7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible, +and, indeed, preferable, to specify the bonding options in the ifcfg-bond0 +file, e.g. a line of the format: + +BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254" + + will configure the bond with the specified options. The options +specified in BONDING_OPTS are identical to the bonding module parameters +except for the arp_ip_target field when using versions of initscripts older +than and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2). When +using older versions each target should be included as a separate option and +should be preceded by a '+' to indicate it should be added to the list of +queried targets, e.g., + + arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 + + is the proper syntax to specify multiple targets. When specifying +options via BONDING_OPTS, it is not necessary to edit /etc/modprobe.d/*.conf. + + For even older versions of initscripts that do not support +BONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon +your distro) to load the bonding module with your desired options when the +bond0 interface is brought up. The following lines in /etc/modprobe.d/*.conf +will load the bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -766,9 +1217,10 @@ up and running. 3.2.1 Using DHCP with Initscripts --------------------------------- - Recent versions of initscripts (the version supplied with -Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do -have support for assigning IP information to bonding devices via DHCP. + Recent versions of initscripts (the versions supplied with Fedora +Core 3 and Red Hat Enterprise Linux 4, or later versions, are reported to +work) have support for assigning IP information to bonding devices via +DHCP. To configure bonding for DHCP, configure it as described above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" @@ -778,20 +1230,16 @@ is case sensitive. 3.2.2 Configuring Multiple Bonds with Initscripts ------------------------------------------------- - At this writing, the initscripts package does not directly -support loading the bonding driver multiple times, so the process for -doing so is the same as described in the "Configuring Multiple Bonds -Manually" section, below. - - NOTE: It has been observed that some Red Hat supplied kernels -are apparently unable to rename modules at load time (the "-o bond1" -part). Attempts to pass that option to modprobe will produce an -"Operation not permitted" error. This has been reported on some -Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels -exhibiting this problem, it will be impossible to configure multiple -bonds with differing parameters. + Initscripts packages that are included with Fedora 7 and Red Hat +Enterprise Linux 5 support multiple bonding interfaces by simply +specifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the +number of the bond. This support requires sysfs support in the kernel, +and a bonding driver of version 3.0.0 or later. Other configurations may +not support this method for specifying multiple bonding interfaces; for +those instances, see the "Configuring Multiple Bonds Manually" section, +below. -3.3 Configuring Bonding Manually with Ifenslave +3.3 Configuring Bonding Manually with iproute2 ----------------------------------------------- This section applies to distros whose network initialization @@ -800,9 +1248,9 @@ knowledge of bonding. One such distro is SuSE Linux Enterprise Server version 8. The general method for these systems is to place the bonding -module parameters into /etc/modules.conf or /etc/modprobe.conf (as +module parameters into a config file in /etc/modprobe.d/ (as appropriate for the installed distro), then add modprobe and/or -ifenslave commands to the system's global init script. The name of +`ip link` commands to the system's global init script. The name of the global init script differs; for sysconfig, it is /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. @@ -814,8 +1262,8 @@ reboots, edit the appropriate file (/etc/init.d/boot.local or modprobe bonding mode=balance-alb miimon=100 modprobe e100 ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up -ifenslave bond0 eth0 -ifenslave bond0 eth1 +ip link set eth0 master bond0 +ip link set eth1 master bond0 Replace the example bonding module parameters and bond0 network configuration (IP address, netmask, etc) with the appropriate @@ -860,20 +1308,24 @@ initialization scripts lack support for configuring multiple bonds. options, you may wish to use the "max_bonds" module parameter, documented above. - To create multiple bonding devices with differing options, it -is necessary to load the bonding driver multiple times. Note that -current versions of the sysconfig network initialization scripts -handle this automatically; if your distro uses these scripts, no -special action is needed. See the section Configuring Bonding -Devices, above, if you're not sure about your network initialization -scripts. + To create multiple bonding devices with differing options, it is +preferable to use bonding parameters exported by sysfs, documented in the +section below. + + For versions of bonding without sysfs support, the only means to +provide multiple instances of bonding with differing options is to load +the bonding driver multiple times. Note that current versions of the +sysconfig network initialization scripts handle this automatically; if +your distro uses these scripts, no special action is needed. See the +section Configuring Bonding Devices, above, if you're not sure about your +network initialization scripts. To load multiple instances of the module, it is necessary to specify a different name for each instance (the module loading system requires that every loaded module, even multiple instances of the same -module, have a unique name). This is accomplished by supplying -multiple sets of bonding options in /etc/modprobe.conf, for example: - +module, have a unique name). This is accomplished by supplying multiple +sets of bonding options in /etc/modprobe.d/*.conf, for example: + alias bond0 bonding options bond0 -o bond0 mode=balance-rr miimon=100 @@ -896,10 +1348,18 @@ install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ This may be repeated any number of times, specifying a new and unique name in place of bond1 for each subsequent instance. + It has been observed that some Red Hat supplied kernels are unable +to rename modules at load time (the "-o bond1" part). Attempts to pass +that option to modprobe will produce an "Operation not permitted" error. +This has been reported on some Fedora Core kernels, and has been seen on +RHEL 4 as well. On kernels exhibiting this problem, it will be impossible +to configure multiple bonds with differing parameters (as they are older +kernels, and also lack sysfs support). + 3.4 Configuring Bonding Manually via Sysfs ------------------------------------------ - Starting with version 3.0, Channel Bonding may be configured + Starting with version 3.0.0, Channel Bonding may be configured via the sysfs interface. This interface allows dynamic configuration of all bonds in the system without unloading the module. It also allows for adding and removing bonds at runtime. Ifenslave is no @@ -944,9 +1404,6 @@ To enslave interface eth0 to bond bond0: To free slave eth0 from bond bond0: # echo -eth0 > /sys/class/net/bond0/bonding/slaves - NOTE: The bond must be up before slaves can be added. All -slaves are freed when the interface is brought down. - When an interface is enslaved to a bond, symlinks between the two are created in the sysfs filesystem. In this case, you would get /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and @@ -964,7 +1421,7 @@ Changing a Bond's Configuration files located in /sys/class/net/<bond name>/bonding The names of these files correspond directly with the command- -line parameters described elsewhere in in this file, and, with the +line parameters described elsewhere in this file, and, with the exception of arp_ip_target, they accept the same values. To see the current setting, simply cat the appropriate file. @@ -988,11 +1445,17 @@ monitoring is enabled, and vice-versa. To add ARP targets: # echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target # echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target - NOTE: up to 10 target addresses may be specified. + NOTE: up to 16 target addresses may be specified. To remove an ARP target: # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target +To configure the interval between learning packet transmits: +# echo 12 > /sys/class/net/bond0/bonding/lp_interval + NOTE: the lp_inteval is the number of seconds between instances where +the bonding driver sends learning packets to each slaves peer switch. The +default interval is 1 second. + Example Configuration --------------------- We begin with the same example that is shown in section 3.3, @@ -1024,8 +1487,141 @@ echo 2000 > /sys/class/net/bond1/bonding/arp_interval echo +eth2 > /sys/class/net/bond1/bonding/slaves echo +eth3 > /sys/class/net/bond1/bonding/slaves +3.5 Configuration with Interfaces Support +----------------------------------------- + + This section applies to distros which use /etc/network/interfaces file +to describe network interface configuration, most notably Debian and it's +derivatives. + + The ifup and ifdown commands on Debian don't support bonding out of +the box. The ifenslave-2.6 package should be installed to provide bonding +support. Once installed, this package will provide bond-* options to be used +into /etc/network/interfaces. + + Note that ifenslave-2.6 package will load the bonding module and use +the ifenslave command when appropriate. + +Example Configurations +---------------------- -4. Querying Bonding Configuration +In /etc/network/interfaces, the following stanza will configure bond0, in +active-backup mode, with eth0 and eth1 as slaves. + +auto bond0 +iface bond0 inet dhcp + bond-slaves eth0 eth1 + bond-mode active-backup + bond-miimon 100 + bond-primary eth0 eth1 + +If the above configuration doesn't work, you might have a system using +upstart for system startup. This is most notably true for recent +Ubuntu versions. The following stanza in /etc/network/interfaces will +produce the same result on those systems. + +auto bond0 +iface bond0 inet dhcp + bond-slaves none + bond-mode active-backup + bond-miimon 100 + +auto eth0 +iface eth0 inet manual + bond-master bond0 + bond-primary eth0 eth1 + +auto eth1 +iface eth1 inet manual + bond-master bond0 + bond-primary eth0 eth1 + +For a full list of bond-* supported options in /etc/network/interfaces and some +more advanced examples tailored to you particular distros, see the files in +/usr/share/doc/ifenslave-2.6. + +3.6 Overriding Configuration for Special Cases +---------------------------------------------- + +When using the bonding driver, the physical port which transmits a frame is +typically selected by the bonding driver, and is not relevant to the user or +system administrator. The output port is simply selected using the policies of +the selected bonding mode. On occasion however, it is helpful to direct certain +classes of traffic to certain physical interfaces on output to implement +slightly more complex policies. For example, to reach a web server over a +bonded interface in which eth0 connects to a private network, while eth1 +connects via a public network, it may be desirous to bias the bond to send said +traffic over eth0 first, using eth1 only as a fall back, while all other traffic +can safely be sent over either interface. Such configurations may be achieved +using the traffic control utilities inherent in linux. + +By default the bonding driver is multiqueue aware and 16 queues are created +when the driver initializes (see Documentation/networking/multiqueue.txt +for details). If more or less queues are desired the module parameter +tx_queues can be used to change this value. There is no sysfs parameter +available as the allocation is done at module init time. + +The output of the file /proc/net/bonding/bondX has changed so the output Queue +ID is now printed for each slave: + +Bonding Mode: fault-tolerance (active-backup) +Primary Slave: None +Currently Active Slave: eth0 +MII Status: up +MII Polling Interval (ms): 0 +Up Delay (ms): 0 +Down Delay (ms): 0 + +Slave Interface: eth0 +MII Status: up +Link Failure Count: 0 +Permanent HW addr: 00:1a:a0:12:8f:cb +Slave queue ID: 0 + +Slave Interface: eth1 +MII Status: up +Link Failure Count: 0 +Permanent HW addr: 00:1a:a0:12:8f:cc +Slave queue ID: 2 + +The queue_id for a slave can be set using the command: + +# echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id + +Any interface that needs a queue_id set should set it with multiple calls +like the one above until proper priorities are set for all interfaces. On +distributions that allow configuration via initscripts, multiple 'queue_id' +arguments can be added to BONDING_OPTS to set all needed slave queues. + +These queue id's can be used in conjunction with the tc utility to configure +a multiqueue qdisc and filters to bias certain traffic to transmit on certain +slave devices. For instance, say we wanted, in the above configuration to +force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output +device. The following commands would accomplish this: + +# tc qdisc add dev bond0 handle 1 root multiq + +# tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \ + 192.168.1.100 action skbedit queue_mapping 2 + +These commands tell the kernel to attach a multiqueue queue discipline to the +bond0 interface and filter traffic enqueued to it, such that packets with a dst +ip of 192.168.1.100 have their output queue mapping value overwritten to 2. +This value is then passed into the driver, causing the normal output path +selection policy to be overridden, selecting instead qid 2, which maps to eth1. + +Note that qid values begin at 1. Qid 0 is reserved to initiate to the driver +that normal output policy selection should take place. One benefit to simply +leaving the qid for a slave to 0 is the multiqueue awareness in the bonding +driver that is now present. This awareness allows tc filters to be placed on +slave devices as well as bond devices and the bonding driver will simply act as +a pass-through for selecting output queues on the slave device rather than +output port selection. + +This feature first appeared in bonding driver version 3.7.0 and support for +output slave selection was limited to round-robin and active-backup modes. + +4 Querying Bonding Configuration ================================= 4.1 Bonding Configuration @@ -1299,8 +1895,8 @@ route additions may cause trouble. On systems with network configuration scripts that do not associate physical devices directly with network interface names (so that the same physical device always has the same "ethX" name), it may -be necessary to add some special logic to either /etc/modules.conf or -/etc/modprobe.conf (depending upon which is installed on the system). +be necessary to add some special logic to config files in +/etc/modprobe.d/. For example, given a modules.conf containing the following: @@ -1327,20 +1923,15 @@ add above bonding e1000 tg3 bonding is loaded. This command is fully documented in the modules.conf manual page. - On systems utilizing modprobe.conf (or modprobe.conf.local), -an equivalent problem can occur. In this case, the following can be -added to modprobe.conf (or modprobe.conf.local, as appropriate), as -follows (all on one line; it has been split here for clarity): + On systems utilizing modprobe an equivalent problem can occur. +In this case, the following can be added to config files in +/etc/modprobe.d/ as: -install bonding /sbin/modprobe tg3; /sbin/modprobe e1000; - /sbin/modprobe --ignore-install bonding +softdep bonding pre: tg3 e1000 - This will, when loading the bonding module, rather than -performing the normal action, instead execute the provided command. -This command loads the device drivers in the order needed, then calls -modprobe with --ignore-install to cause the normal action to then take -place. Full documentation on this can be found in the modprobe.conf -and modprobe manual pages. + This will load tg3 and e1000 modules before loading the bonding one. +Full documentation on this can be found in the modprobe.d and modprobe +manual pages. 8.3. Painfully Slow Or No Failed Link Detection By Miimon --------------------------------------------------------- @@ -1464,7 +2055,7 @@ access to fail over to. Additionally, the bonding load balance modes support link monitoring of their members, so if individual links fail, the load will be rebalanced across the remaining devices. - See Section 13, "Configuring Bonding for Maximum Throughput" + See Section 12, "Configuring Bonding for Maximum Throughput" for information on configuring bonding with one peer device. 11.2 High Availability in a Multiple Switch Topology @@ -1536,6 +2127,15 @@ one for each switch in the network). This will insure that, regardless of which switch is active, the ARP monitor has a suitable target to query. + Note, also, that of late many switches now support a functionality +generally referred to as "trunk failover." This is a feature of the +switch that causes the link state of a particular switch port to be set +down (or up) when the state of another switch port goes down (or up). +Its purpose is to propagate link failures from logically "exterior" ports +to the logically "interior" ports that bonding is able to monitor via +miimon. Availability and configuration for trunk failover varies by +switch, but this can be a viable alternative to the ARP monitor when using +suitable switches. 12. Configuring Bonding for Maximum Throughput ============================================== @@ -1623,7 +2223,7 @@ balance-rr: This mode is the only mode that will permit a single interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the - striping often results in peer systems receiving packets out + striping generally results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments. @@ -1635,22 +2235,20 @@ balance-rr: This mode is the only mode that will permit a single interface's worth of throughput, even after adjusting tcp_reordering. - Note that this out of order delivery occurs when both the - sending and receiving systems are utilizing a multiple - interface bond. Consider a configuration in which a - balance-rr bond feeds into a single higher capacity network - channel (e.g., multiple 100Mb/sec ethernets feeding a single - gigabit ethernet via an etherchannel capable switch). In this - configuration, traffic sent from the multiple 100Mb devices to - a destination connected to the gigabit device will not see - packets out of order. However, traffic sent from the gigabit - device to the multiple 100Mb devices may or may not see - traffic out of order, depending upon the balance policy of the - switch. Many switches do not support any modes that stripe - traffic (instead choosing a port based upon IP or MAC level - addresses); for those devices, traffic flowing from the - gigabit device to the many 100Mb devices will only utilize one - interface. + Note that the fraction of packets that will be delivered out of + order is highly variable, and is unlikely to be zero. The level + of reordering depends upon a variety of factors, including the + networking interfaces, the switch, and the topology of the + configuration. Speaking in general terms, higher speed network + cards produce more reordering (due to factors such as packet + coalescing), and a "many to many" topology will reorder at a + higher rate than a "many slow to one fast" configuration. + + Many switches do not support any modes that stripe traffic + (instead choosing a port based upon IP or MAC level addresses); + for those devices, traffic for a particular connection flowing + through the switch to a balance-rr bond will not utilize greater + than one interface's worth of bandwidth. If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order @@ -1850,6 +2448,10 @@ Failover may be delayed via the downdelay bonding module option. 13.2 Duplicated Incoming Packets -------------------------------- + NOTE: Starting with version 3.0.2, the bonding driver has logic to +suppress duplicate packets, which should largely eliminate this problem. +The following description is kept for reference. + It is not uncommon to observe a short burst of duplicated traffic when the bonding device is first used, or after it has been idle for some period of time. This is most easily observed by issuing @@ -2010,6 +2612,9 @@ The new driver was designed to be SMP safe from the start. EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, devices need not be of the same speed. + Starting with version 3.2.1, bonding also supports Infiniband +slaves in active-backup mode. + 3. How many bonding devices can I have? There is no limit. @@ -2068,11 +2673,15 @@ switches currently available support 802.3ad. 8. Where does a bonding device get its MAC address from? - If not explicitly configured (with ifconfig or ip link), the -MAC address of the bonding device is taken from its first slave -device. This MAC address is then passed to all following slaves and -remains persistent (even if the first slave is removed) until the -bonding device is brought down or reconfigured. + When using slave devices that have fixed MAC addresses, or when +the fail_over_mac option is enabled, the bonding device's MAC address is +the MAC address of the active slave. + + For other configurations, if not explicitly configured (with +ifconfig or ip link), the MAC address of the bonding device is taken from +its first slave device. This MAC address is then passed to all following +slaves and remains persistent (even if the first slave is removed) until +the bonding device is brought down or reconfigured. If you wish to change the MAC address, you can set it with ifconfig or ip link: @@ -2099,18 +2708,15 @@ enslaved. 16. Resources and Links ======================= -The latest version of the bonding driver can be found in the latest + The latest version of the bonding driver can be found in the latest version of the linux kernel, found on http://kernel.org -The latest version of this document can be found in either the latest -kernel source (named Documentation/networking/bonding.txt), or on the -bonding sourceforge site: - -http://www.sourceforge.net/projects/bonding + The latest version of this document can be found in the latest kernel +source (named Documentation/networking/bonding.txt). -Discussions regarding the bonding driver take place primarily on the -bonding-devel mailing list, hosted at sourceforge.net. If you have -questions or problems, post them to the list. The list address is: + Discussions regarding the usage of the bonding driver take place on the +bonding-devel mailing list, hosted at sourceforge.net. If you have questions or +problems, post them to the list. The list address is: bonding-devel@lists.sourceforge.net @@ -2119,8 +2725,19 @@ be found at: https://lists.sourceforge.net/lists/listinfo/bonding-devel + Discussions regarding the development of the bonding driver take place +on the main Linux network mailing list, hosted at vger.kernel.org. The list +address is: + +netdev@vger.kernel.org + + The administrative interface (to subscribe or unsubscribe) can +be found at: + +http://vger.kernel.org/vger-lists.html#netdev + Donald Becker's Ethernet Drivers and diag programs may be found at : - - http://www.scyld.com/network/ + - http://web.archive.org/web/*/http://www.scyld.com/network/ You will also find a lot of information regarding Ethernet, NWay, MII, etc. at www.scyld.com. |
