diff options
author | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 15:20:36 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 15:20:36 -0700 |
commit | 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch) | |
tree | 0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/networking/bonding.txt |
Linux-2.6.12-rc2v2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
Diffstat (limited to 'Documentation/networking/bonding.txt')
-rw-r--r-- | Documentation/networking/bonding.txt | 1618 |
1 files changed, 1618 insertions, 0 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt new file mode 100644 index 00000000000..0bc2ed136a3 --- /dev/null +++ b/Documentation/networking/bonding.txt @@ -0,0 +1,1618 @@ + + Linux Ethernet Bonding Driver HOWTO + +Initial release : Thomas Davis <tadavis at lbl.gov> +Corrections, HA extensions : 2000/10/03-15 : + - Willy Tarreau <willy at meta-x.org> + - Constantine Gavrilov <const-g at xpert.com> + - Chad N. Tindel <ctindel at ieee dot org> + - Janice Girouard <girouard at us dot ibm dot com> + - Jay Vosburgh <fubar at us dot ibm dot com> + +Reorganized and updated Feb 2005 by Jay Vosburgh + +Note : +------ + +The bonding driver originally came from Donald Becker's beowulf patches for +kernel 2.0. It has changed quite a bit since, and the original tools from +extreme-linux and beowulf sites will not work with this version of the driver. + +For new versions of the driver, patches for older kernels and the updated +userspace tools, please follow the links at the end of this file. + +Table of Contents +================= + +1. Bonding Driver Installation + +2. Bonding Driver Options + +3. Configuring Bonding Devices +3.1 Configuration with sysconfig support +3.2 Configuration with initscripts support +3.3 Configuring Bonding Manually +3.4 Configuring Multiple Bonds + +5. Querying Bonding Configuration +5.1 Bonding Configuration +5.2 Network Configuration + +6. Switch Configuration + +7. 802.1q VLAN Support + +8. Link Monitoring +8.1 ARP Monitor Operation +8.2 Configuring Multiple ARP Targets +8.3 MII Monitor Operation + +9. Potential Trouble Sources +9.1 Adventures in Routing +9.2 Ethernet Device Renaming +9.3 Painfully Slow Or No Failed Link Detection By Miimon + +10. SNMP agents + +11. Promiscuous mode + +12. High Availability Information +12.1 High Availability in a Single Switch Topology +12.1.1 Bonding Mode Selection for Single Switch Topology +12.1.2 Link Monitoring for Single Switch Topology +12.2 High Availability in a Multiple Switch Topology +12.2.1 Bonding Mode Selection for Multiple Switch Topology +12.2.2 Link Monitoring for Multiple Switch Topology +12.3 Switch Behavior Issues for High Availability + +13. Hardware Specific Considerations +13.1 IBM BladeCenter + +14. Frequently Asked Questions + +15. Resources and Links + + +1. Bonding Driver Installation +============================== + + Most popular distro kernels ship with the bonding driver +already available as a module and the ifenslave user level control +program installed and ready for use. If your distro does not, or you +have need to compile bonding from source (e.g., configuring and +installing a mainline kernel from kernel.org), you'll need to perform +the following steps: + +1.1 Configure and build the kernel with bonding +----------------------------------------------- + + The latest version of the bonding driver is available in the +drivers/net/bonding subdirectory of the most recent kernel source +(which is available on http://kernel.org). + + Prior to the 2.4.11 kernel, the bonding driver was maintained +largely outside the kernel tree; patches for some earlier kernels are +available on the bonding sourceforge site, although those patches are +still several years out of date. Most users will want to use either +the most recent kernel from kernel.org or whatever kernel came with +their distro. + + Configure kernel with "make menuconfig" (or "make xconfig" or +"make config"), then select "Bonding driver support" in the "Network +device support" section. It is recommended that you configure the +driver as module since it is currently the only way to pass parameters +to the driver or configure more than one bonding device. + + Build and install the new kernel and modules, then proceed to +step 2. + +1.2 Install ifenslave Control Utility +------------------------------------- + + The ifenslave user level control program is included in the +kernel source tree, in the file Documentation/networking/ifenslave.c. +It is generally recommended that you use the ifenslave that +corresponds to the kernel that you are using (either from the same +source tree or supplied with the distro), however, ifenslave +executables from older kernels should function (but features newer +than the ifenslave release are not supported). Running an ifenslave +that is newer than the kernel is not supported, and may or may not +work. + + To install ifenslave, do the following: + +# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave +# cp ifenslave /sbin/ifenslave + + If your kernel source is not in "/usr/src/linux," then replace +"/usr/src/linux/include" in the above with the location of your kernel +source include directory. + + You may wish to back up any existing /sbin/ifenslave, or, for +testing or informal use, tag the ifenslave to the kernel version +(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10). + +IMPORTANT NOTE: + + If you omit the "-I" or specify an incorrect directory, you +may end up with an ifenslave that is incompatible with the kernel +you're trying to build it for. Some distros (e.g., Red Hat from 7.1 +onwards) do not have /usr/include/linux symbolically linked to the +default kernel source include directory. + + +2. Bonding Driver Options +========================= + + Options for the bonding driver are supplied as parameters to +the bonding module at load time. They may be given as command line +arguments to the insmod or modprobe command, but are usually specified +in either the /etc/modprobe.conf configuration file, or in a +distro-specific configuration file (some of which are detailed in the +next section). + + The available bonding driver parameters are listed below. If a +parameter is not specified the default value is used. When initially +configuring a bond, it is recommended "tail -f /var/log/messages" be +run in a separate window to watch for bonding driver error messages. + + It is critical that either the miimon or arp_interval and +arp_ip_target parameters be specified, otherwise serious network +degradation will occur during link failures. Very few devices do not +support at least miimon, so there is really no reason not to use it. + + Options with textual values will accept either the text name + or, for backwards compatibility, the option value. E.g., + "mode=802.3ad" and "mode=4" set the same mode. + + The parameters are as follows: + +arp_interval + + Specifies the ARP monitoring frequency in milli-seconds. If + ARP monitoring is used in a load-balancing mode (mode 0 or 2), + the switch should be configured in a mode that evenly + distributes packets across all links - such as round-robin. If + the switch is configured to distribute the packets in an XOR + fashion, all replies from the ARP targets will be received on + the same link which could cause the other team members to + fail. ARP monitoring should not be used in conjunction with + miimon. A value of 0 disables ARP monitoring. The default + value is 0. + +arp_ip_target + + Specifies the ip addresses to use when arp_interval is > 0. + These are the targets of the ARP request sent to determine the + health of the link to the targets. Specify these values in + ddd.ddd.ddd.ddd format. Multiple ip adresses must be + seperated by a comma. At least one IP address must be given + for ARP monitoring to function. The maximum number of targets + that can be specified is 16. The default value is no IP + addresses. + +downdelay + + Specifies the time, in milliseconds, to wait before disabling + a slave after a link failure has been detected. This option + is only valid for the miimon link monitor. The downdelay + value should be a multiple of the miimon value; if not, it + will be rounded down to the nearest multiple. The default + value is 0. + +lacp_rate + + Option specifying the rate in which we'll ask our link partner + to transmit LACPDU packets in 802.3ad mode. Possible values + are: + + slow or 0 + Request partner to transmit LACPDUs every 30 seconds (default) + + fast or 1 + Request partner to transmit LACPDUs every 1 second + +max_bonds + + Specifies the number of bonding devices to create for this + instance of the bonding driver. E.g., if max_bonds is 3, and + the bonding driver is not already loaded, then bond0, bond1 + and bond2 will be created. The default value is 1. + +miimon + + Specifies the frequency in milli-seconds that MII link + monitoring will occur. A value of zero disables MII link + monitoring. A value of 100 is a good starting point. The + use_carrier option, below, affects how the link state is + determined. See the High Availability section for additional + information. The default value is 0. + +mode + + Specifies one of the bonding policies. The default is + balance-rr (round robin). Possible values are: + + balance-rr or 0 + + Round-robin policy: Transmit packets in sequential + order from the first available slave through the + last. This mode provides load balancing and fault + tolerance. + + active-backup or 1 + + Active-backup policy: Only one slave in the bond is + active. A different slave becomes active if, and only + if, the active slave fails. The bond's MAC address is + externally visible on only one port (network adapter) + to avoid confusing the switch. This mode provides + fault tolerance. The primary option affects the + behavior of this mode. + + balance-xor or 2 + + XOR policy: Transmit based on [(source MAC address + XOR'd with destination MAC address) modulo slave + count]. This selects the same slave for each + destination MAC address. This mode provides load + balancing and fault tolerance. + + broadcast or 3 + + Broadcast policy: transmits everything on all slave + interfaces. This mode provides fault tolerance. + + 802.3ad or 4 + + IEEE 802.3ad Dynamic link aggregation. Creates + aggregation groups that share the same speed and + duplex settings. Utilizes all slaves in the active + aggregator according to the 802.3ad specification. + + Pre-requisites: + + 1. Ethtool support in the base drivers for retrieving + the speed and duplex of each slave. + + 2. A switch that supports IEEE 802.3ad Dynamic link + aggregation. + + Most switches will require some type of configuration + to enable 802.3ad mode. + + balance-tlb or 5 + + Adaptive transmit load balancing: channel bonding that + does not require any special switch support. The + outgoing traffic is distributed according to the + current load (computed relative to the speed) on each + slave. Incoming traffic is received by the current + slave. If the receiving slave fails, another slave + takes over the MAC address of the failed receiving + slave. + + Prerequisite: + + Ethtool support in the base drivers for retrieving the + speed of each slave. + + balance-alb or 6 + + Adaptive load balancing: includes balance-tlb plus + receive load balancing (rlb) for IPV4 traffic, and + does not require any special switch support. The + receive load balancing is achieved by ARP negotiation. + The bonding driver intercepts the ARP Replies sent by + the local system on their way out and overwrites the + source hardware address with the unique hardware + address of one of the slaves in the bond such that + different peers use different hardware addresses for + the server. + + Receive traffic from connections created by the server + is also balanced. When the local system sends an ARP + Request the bonding driver copies and saves the peer's + IP information from the ARP packet. When the ARP + Reply arrives from the peer, its hardware address is + retrieved and the bonding driver initiates an ARP + reply to this peer assigning it to one of the slaves + in the bond. A problematic outcome of using ARP + negotiation for balancing is that each time that an + ARP request is broadcast it uses the hardware address + of the bond. Hence, peers learn the hardware address + of the bond and the balancing of receive traffic + collapses to the current slave. This is handled by + sending updates (ARP Replies) to all the peers with + their individually assigned hardware address such that + the traffic is redistributed. Receive traffic is also + redistributed when a new slave is added to the bond + and when an inactive slave is re-activated. The + receive load is distributed sequentially (round robin) + among the group of highest speed slaves in the bond. + + When a link is reconnected or a new slave joins the + bond the receive traffic is redistributed among all + active slaves in the bond by intiating ARP Replies + with the selected mac address to each of the + clients. The updelay parameter (detailed below) must + be set to a value equal or greater than the switch's + forwarding delay so that the ARP Replies sent to the + peers will not be blocked by the switch. + + Prerequisites: + + 1. Ethtool support in the base drivers for retrieving + the speed of each slave. + + 2. Base driver support for setting the hardware + address of a device while it is open. This is + required so that there will always be one slave in the + team using the bond hardware address (the + curr_active_slave) while having a unique hardware + address for each slave in the bond. If the + curr_active_slave fails its hardware address is + swapped with the new curr_active_slave that was + chosen. + +primary + + A string (eth0, eth2, etc) specifying which slave is the + primary device. The specified device will always be the + active slave while it is available. Only when the primary is + off-line will alternate devices be used. This is useful when + one slave is preferred over another, e.g., when one slave has + higher throughput than another. + + The primary option is only valid for active-backup mode. + +updelay + + Specifies the time, in milliseconds, to wait before enabling a + slave after a link recovery has been detected. This option is + only valid for the miimon link monitor. The updelay value + should be a multiple of the miimon value; if not, it will be + rounded down to the nearest multiple. The default value is 0. + +use_carrier + + Specifies whether or not miimon should use MII or ETHTOOL + ioctls vs. netif_carrier_ok() to determine the link + status. The MII or ETHTOOL ioctls are less efficient and + utilize a deprecated calling sequence within the kernel. The + netif_carrier_ok() relies on the device driver to maintain its + state with netif_carrier_on/off; at this writing, most, but + not all, device drivers support this facility. + + If bonding insists that the link is up when it should not be, + it may be that your network device driver does not support + netif_carrier_on/off. The default state for netif_carrier is + "carrier on," so if a driver does not support netif_carrier, + it will appear as if the link is always up. In this case, + setting use_carrier to 0 will cause bonding to revert to the + MII / ETHTOOL ioctl method to determine the link state. + + A value of 1 enables the use of netif_carrier_ok(), a value of + 0 will use the deprecated MII / ETHTOOL ioctls. The default + value is 1. + + + +3. Configuring Bonding Devices +============================== + + There are, essentially, two methods for configuring bonding: +with support from the distro's network initialization scripts, and +without. Distros generally use one of two packages for the network +initialization scripts: initscripts or sysconfig. Recent versions of +these packages have support for bonding, while older versions do not. + + We will first describe the options for configuring bonding for +distros using versions of initscripts and sysconfig with full or +partial support for bonding, then provide information on enabling +bonding without support from the network initialization scripts (i.e., +older versions of initscripts or sysconfig). + + If you're unsure whether your distro uses sysconfig or +initscripts, or don't know if it's new enough, have no fear. +Determining this is fairly straightforward. + + First, issue the command: + +$ rpm -qf /sbin/ifup + + It will respond with a line of text starting with either +"initscripts" or "sysconfig," followed by some numbers. This is the +package that provides your network initialization scripts. + + Next, to determine if your installation supports bonding, +issue the command: + +$ grep ifenslave /sbin/ifup + + If this returns any matches, then your initscripts or +sysconfig has support for bonding. + +3.1 Configuration with sysconfig support +---------------------------------------- + + This section applies to distros using a version of sysconfig +with bonding support, for example, SuSE Linux Enterprise Server 9. + + SuSE SLES 9's networking configuration system does support +bonding, however, at this writing, the YaST system configuration +frontend does not provide any means to work with bonding devices. +Bonding devices can be managed by hand, however, as follows. + + First, if they have not already been configured, configure the +slave devices. On SLES 9, this is most easily done by running the +yast2 sysconfig configuration utility. The goal is for to create an +ifcfg-id file for each slave device. The simplest way to accomplish +this is to configure the devices for DHCP. The name of the +configuration file for each device will be of the form: + +ifcfg-id-xx:xx:xx:xx:xx:xx + + Where the "xx" portion will be replaced with the digits from +the device's permanent MAC address. + + Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been +created, it is necessary to edit the configuration files for the slave +devices (the MAC addresses correspond to those of the slave devices). +Before editing, the file will contain muliple lines, and will look +something like this: + +BOOTPROTO='dhcp' +STARTMODE='on' +USERCTL='no' +UNIQUE='XNzu.WeZGOGF+4wE' +_nm_name='bus-pci-0001:61:01.0' + + Change the BOOTPROTO and STARTMODE lines to the following: + +BOOTPROTO='none' +STARTMODE='off' + + Do not alter the UNIQUE or _nm_name lines. Remove any other +lines (USERCTL, etc). + + Once the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified, +it's time to create the configuration file for the bonding device +itself. This file is named ifcfg-bondX, where X is the number of the +bonding device to create, starting at 0. The first such file is +ifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig +network configuration system will correctly start multiple instances +of bonding. + + The contents of the ifcfg-bondX file is as follows: + +BOOTPROTO="static" +BROADCAST="10.0.2.255" +IPADDR="10.0.2.10" +NETMASK="255.255.0.0" +NETWORK="10.0.2.0" +REMOTE_IPADDR="" +STARTMODE="onboot" +BONDING_MASTER="yes" +BONDING_MODULE_OPTS="mode=active-backup miimon=100" +BONDING_SLAVE0="eth0" +BONDING_SLAVE1="eth1" + + Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK +values with the appropriate values for your network. + + Note that configuring the bonding device with BOOTPROTO='dhcp' +does not work; the scripts attempt to obtain the device address from +DHCP prior to adding any of the slave devices. Without active slaves, +the DHCP requests are not sent to the network. + + The STARTMODE specifies when the device is brought online. +The possible values are: + + onboot: The device is started at boot time. If you're not + sure, this is probably what you want. + + manual: The device is started only when ifup is called + manually. Bonding devices may be configured this + way if you do not wish them to start automatically + at boot for some reason. + + hotplug: The device is started by a hotplug event. This is not + a valid choice for a bonding device. + + off or ignore: The device configuration is ignored. + + The line BONDING_MASTER='yes' indicates that the device is a +bonding master device. The only useful value is "yes." + + The contents of BONDING_MODULE_OPTS are supplied to the +instance of the bonding module for this device. Specify the options +for the bonding mode, link monitoring, and so on here. Do not include +the max_bonds bonding parameter; this will confuse the configuration +system if you have multiple bonding devices. + + Finally, supply one BONDING_SLAVEn="ethX" for each slave, +where "n" is an increasing value, one for each slave, and "ethX" is +the name of the slave device (eth0, eth1, etc). + + When all configuration files have been modified or created, +networking must be restarted for the configuration changes to take +effect. This can be accomplished via the following: + +# /etc/init.d/network restart + + Note that the network control script (/sbin/ifdown) will +remove the bonding module as part of the network shutdown processing, +so it is not necessary to remove the module by hand if, e.g., the +module paramters have changed. + + Also, at this writing, YaST/YaST2 will not manage bonding +devices (they do not show bonding interfaces on its list of network +devices). It is necessary to edit the configuration file by hand to +change the bonding configuration. + + Additional general options and details of the ifcfg file +format can be found in an example ifcfg template file: + +/etc/sysconfig/network/ifcfg.template + + Note that the template does not document the various BONDING_ +settings described above, but does describe many of the other options. + +3.2 Configuration with initscripts support +------------------------------------------ + + This section applies to distros using a version of initscripts +with bonding support, for example, Red Hat Linux 9 or Red Hat +Enterprise Linux version 3. On these systems, the network +initialization scripts have some knowledge of bonding, and can be +configured to control bonding devices. + + These distros will not automatically load the network adapter +driver unless the ethX device is configured with an IP address. +Because of this constraint, users must manually configure a +network-script file for all physical adapters that will be members of +a bondX link. Network script files are located in the directory: + +/etc/sysconfig/network-scripts + + The file name must be prefixed with "ifcfg-eth" and suffixed +with the adapter's physical adapter number. For example, the script +for eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0. +Place the following text in the file: + +DEVICE=eth0 +USERCTL=no +ONBOOT=yes +MASTER=bond0 +SLAVE=yes +BOOTPROTO=none + + The DEVICE= line will be different for every ethX device and +must correspond with the name of the file, i.e., ifcfg-eth1 must have +a device line of DEVICE=eth1. The setting of the MASTER= line will +also depend on the final bonding interface name chosen for your bond. +As with other network devices, these typically start at 0, and go up +one for each device, i.e., the first bonding instance is bond0, the +second is bond1, and so on. + + Next, create a bond network script. The file name for this +script will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is +the number of the bond. For bond0 the file is named "ifcfg-bond0", +for bond1 it is named "ifcfg-bond1", and so on. Within that file, +place the following text: + +DEVICE=bond0 +IPADDR=192.168.1.1 +NETMASK=255.255.255.0 +NETWORK=192.168.1.0 +BROADCAST=192.168.1.255 +ONBOOT=yes +BOOTPROTO=none +USERCTL=no + + Be sure to change the networking specific lines (IPADDR, +NETMASK, NETWORK and BROADCAST) to match your network configuration. + + Finally, it is necessary to edit /etc/modules.conf to load the +bonding module when the bond0 interface is brought up. The following +sample lines in /etc/modules.conf will load the bonding module, and +select its options: + +alias bond0 bonding +options bond0 mode=balance-alb miimon=100 + + Replace the sample parameters with the appropriate set of +options for your configuration. + + Finally run "/etc/rc.d/init.d/network restart" as root. This +will restart the networking subsystem and your bond link should be now +up and running. + + +3.3 Configuring Bonding Manually +-------------------------------- + + This section applies to distros whose network initialization +scripts (the sysconfig or initscripts package) do not have specific +knowledge of bonding. One such distro is SuSE Linux Enterprise Server +version 8. + + The general methodology for these systems is to place the +bonding module parameters into /etc/modprobe.conf, then add modprobe +and/or ifenslave commands to the system's global init script. The +name of the global init script differs; for sysconfig, it is +/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. + + For example, if you wanted to make a simple bond of two e100 +devices (presumed to be eth0 and eth1), and have it persist across +reboots, edit the appropriate file (/etc/init.d/boot.local or +/etc/rc.d/rc.local), and add the following: + +modprobe bonding -obond0 mode=balance-alb miimon=100 +modprobe e100 +ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up +ifenslave bond0 eth0 +ifenslave bond0 eth1 + + Replace the example bonding module parameters and bond0 +network configuration (IP address, netmask, etc) with the appropriate +values for your configuration. The above example loads the bonding +module with the name "bond0," this simplifies the naming if multiple +bonding modules are loaded (each successive instance of the module is +given a different name, and the module instance names match the +bonding interface names). + + Unfortunately, this method will not provide support for the +ifup and ifdown scripts on the bond devices. To reload the bonding +configuration, it is necessary to run the initialization script, e.g., + +# /etc/init.d/boot.local + + or + +# /etc/rc.d/rc.local + + It may be desirable in such a case to create a separate script +which only initializes the bonding configuration, then call that +separate script from within boot.local. This allows for bonding to be +enabled without re-running the entire global init script. + + To shut down the bonding devices, it is necessary to first +mark the bonding device itself as being down, then remove the +appropriate device driver modules. For our example above, you can do +the following: + +# ifconfig bond0 down +# rmmod bond0 +# rmmod e100 + + Again, for convenience, it may be desirable to create a script +with these commands. + + +3.4 Configuring Multiple Bonds +------------------------------ + + This section contains information on configuring multiple +bonding devices with differing options. If you require multiple +bonding devices, but all with the same options, see the "max_bonds" +module paramter, documented above. + + To create multiple bonding devices with differing options, it +is necessary to load the bonding driver multiple times. Note that +current versions of the sysconfig network initialization scripts +handle this automatically; if your distro uses these scripts, no +special action is needed. See the section Configuring Bonding +Devices, above, if you're not sure about your network initialization +scripts. + + To load multiple instances of the module, it is necessary to +specify a different name for each instance (the module loading system +requires that every loaded module, even multiple instances of the same +module, have a unique name). This is accomplished by supplying +multiple sets of bonding options in /etc/modprobe.conf, for example: + +alias bond0 bonding +options bond0 -o bond0 mode=balance-rr miimon=100 + +alias bond1 bonding +options bond1 -o bond1 mode=balance-alb miimon=50 + + will load the bonding module two times. The first instance is +named "bond0" and creates the bond0 device in balance-rr mode with an +miimon of 100. The second instance is named "bond1" and creates the +bond1 device in balance-alb mode with an miimon of 50. + + This may be repeated any number of times, specifying a new and +unique name in place of bond0 or bond1 for each instance. + + When the appropriate module paramters are in place, then +configure bonding according to the instructions for your distro. + +5. Querying Bonding Configuration +================================= + +5.1 Bonding Configuration +------------------------- + + Each bonding device has a read-only file residing in the +/proc/net/bonding directory. The file contents include information +about the bonding configuration, options and state of each slave. + + For example, the contents of /proc/net/bonding/bond0 after the +driver is loaded with parameters of mode=0 and miimon=1000 is +generally as follows: + + Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004) + Bonding Mode: load balancing (round-robin) + Currently Active Slave: eth0 + MII Status: up + MII Polling Interval (ms): 1000 + Up Delay (ms): 0 + Down Delay (ms): 0 + + Slave Interface: eth1 + MII Status: up + Link Failure Count: 1 + + Slave Interface: eth0 + MII Status: up + Link Failure Count: 1 + + The precise format and contents will change depending upon the +bonding configuration, state, and version of the bonding driver. + +5.2 Network configuration +------------------------- + + The network configuration can be inspected using the ifconfig +command. Bonding devices will have the MASTER flag set; Bonding slave +devices will have the SLAVE flag set. The ifconfig output does not +contain information on which slaves are associated with which masters. + + In the example below, the bond0 interface is the master +(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of +bond0 have the same MAC address (HWaddr) as bond0 for all modes except +TLB and ALB that require a unique MAC address for each slave. + +# /sbin/ifconfig +bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 + UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 + RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 + TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 + collisions:0 txqueuelen:0 + +eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 + UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 + RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 + TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 + collisions:0 txqueuelen:100 + Interrupt:10 Base address:0x1080 + +eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 + inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 + UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 + RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 + TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:100 + Interrupt:9 Base address:0x1400 + +6. Switch Configuration +======================= + + For this section, "switch" refers to whatever system the +bonded devices are directly connected to (i.e., where the other end of +the cable plugs into). This may be an actual dedicated switch device, +or it may be another regular system (e.g., another computer running +Linux), + + The active-backup, balance-tlb and balance-alb modes do not +require any specific configuration of the switch. + + The 802.3ad mode requires that the switch have the appropriate +ports configured as an 802.3ad aggregation. The precise method used +to configure this varies from switch to switch, but, for example, a +Cisco 3550 series switch requires that the appropriate ports first be +grouped together in a single etherchannel instance, then that +etherchannel is set to mode "lacp" to enable 802.3ad (instead of +standard EtherChannel). + + The balance-rr, balance-xor and broadcast modes generally +require that the switch have the appropriate ports grouped together. +The nomenclature for such a group differs between switches, it may be +called an "etherchannel" (as in the Cisco example, above), a "trunk +group" or some other similar variation. For these modes, each switch +will also have its own configuration options for the switch's transmit +policy to the bond. Typical choices include XOR of either the MAC or +IP addresses. The transmit policy of the two peers does not need to +match. For these three modes, the bonding mode really selects a +transmit policy for an EtherChannel group; all three will interoperate +with another EtherChannel group. + + +7. 802.1q VLAN Support +====================== + + It is possible to configure VLAN devices over a bond interface +using the 8021q driver. However, only packets coming from the 8021q +driver and passing through bonding will be tagged by default. Self +generated packets, for example, bonding's learning packets or ARP +packets generated by either ALB mode or the ARP monitor mechanism, are +tagged internally by bonding itself. As a result, bonding must +"learn" the VLAN IDs configured above it, and use those IDs to tag +self generated packets. + + For reasons of simplicity, and to support the use of adapters +that can do VLAN hardware acceleration offloding, the bonding +interface declares itself as fully hardware offloaing capable, it gets +the add_vid/kill_vid notifications to gather the necessary +information, and it propagates those actions to the slaves. In case +of mixed adapter types, hardware accelerated tagged packets that +should go through an adapter that is not offloading capable are +"un-accelerated" by the bonding driver so the VLAN tag sits in the +regular location. + + VLAN interfaces *must* be added on top of a bonding interface +only after enslaving at least one slave. The bonding interface has a +hardware address of 00:00:00:00:00:00 until the first slave is added. +If the VLAN interface is created prior to the first enslavement, it +would pick up the all-zeroes hardware address. Once the first slave +is attached to the bond, the bond device itself will pick up the +slave's hardware address, which is then available for the VLAN device. + + Also, be aware that a similar problem can occur if all slaves +are released from a bond that still has one or more VLAN interfaces on +top of it. When a new slave is added, the bonding interface will +obtain its hardware address from the first slave, which might not +match the hardware address of the VLAN interfaces (which was +ultimately copied from an earlier slave). + + There are two methods to insure that the VLAN device operates +with the correct hardware address if all slaves are removed from a +bond interface: + + 1. Remove all VLAN interfaces then recreate them + + 2. Set the bonding interface's hardware address so that it +matches the hardware address of the VLAN interfaces. + + Note that changing a VLAN interface's HW address would set the +underlying device -- i.e. the bonding interface -- to promiscouos +mode, which might not be what you want. + + +8. Link Monitoring +================== + + The bonding driver at present supports two schemes for +monitoring a slave device's link state: the ARP monitor and the MII +monitor. + + At the present time, due to implementation restrictions in the +bonding driver itself, it is not possible to enable both ARP and MII +monitoring simultaneously. + +8.1 ARP Monitor Operation +------------------------- + + The ARP monitor operates as its name suggests: it sends ARP +queries to one or more designated peer systems on the network, and +uses the response as an indication that the link is operating. This +gives some assurance that traffic is actually flowing to and from one +or more peers on the local network. + + The ARP monitor relies on the device driver itself to verify +that traffic is flowing. In particular, the driver must keep up to +date the last receive time, dev->last_rx, and transmit start time, +dev->trans_start. If these are not updated by the driver, then the +ARP monitor will immediately fail any slaves using that driver, and +those slaves will stay down. If networking monitoring (tcpdump, etc) +shows the ARP requests and replies on the network, then it may be that +your device driver is not updating last_rx and trans_start. + +8.2 Configuring Multiple ARP Targets +------------------------------------ + + While ARP monitoring can be done with just one target, it can +be useful in a High Availability setup to have several targets to +monitor. In the case of just one target, the target itself may go +down or have a problem making it unresponsive to ARP requests. Having +an additional target (or several) increases the reliability of the ARP +monitoring. + + Multiple ARP targets must be seperated by commas as follows: + +# example options for ARP monitoring with three targets +alias bond0 bonding +options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 + + For just a single target the options would resemble: + +# example options for ARP monitoring with one target +alias bond0 bonding +options bond0 arp_interval=60 arp_ip_target=192.168.0.100 + + +8.3 MII Monitor Operation +------------------------- + + The MII monitor monitors only the carrier state of the local +network interface. It accomplishes this in one of three ways: by +depending upon the device driver to maintain its carrier state, by +querying the device's MII registers, or by making an ethtool query to +the device. |