aboutsummaryrefslogtreecommitdiff
path: root/Documentation/networking/bonding.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/bonding.txt')
-rw-r--r--Documentation/networking/bonding.txt232
1 files changed, 163 insertions, 69 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 87bbcfee2e0..9c723ecd002 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -270,16 +270,15 @@ arp_ip_target
arp_validate
Specifies whether or not ARP probes and replies should be
- validated in the active-backup mode. This causes the ARP
- monitor to examine the incoming ARP requests and replies, and
- only consider a slave to be up if it is receiving the
- appropriate ARP traffic.
+ validated in any mode that supports arp monitoring, or whether
+ non-ARP traffic should be filtered (disregarded) for link
+ monitoring purposes.
Possible values are:
none or 0
- No validation is performed. This is the default.
+ No validation or filtering is performed.
active or 1
@@ -293,31 +292,68 @@ arp_validate
Validation is performed for all slaves.
- For the active slave, the validation checks ARP replies to
- confirm that they were generated by an arp_ip_target. Since
- backup slaves do not typically receive these replies, the
- validation performed for backup slaves is on the ARP request
- sent out via the active slave. It is possible that some
- switch or network configurations may result in situations
- wherein the backup slaves do not receive the ARP requests; in
- such a situation, validation of backup slaves must be
- disabled.
-
- The validation of ARP requests on backup slaves is mainly
- helping bonding to decide which slaves are more likely to
- work in case of the active slave failure, it doesn't really
- guarantee that the backup slave will work if it's selected
- as the next active slave.
-
- This option is useful in network configurations in which
- multiple bonding hosts are concurrently issuing ARPs to one or
- more targets beyond a common switch. Should the link between
- the switch and target fail (but not the switch itself), the
- probe traffic generated by the multiple bonding instances will
- fool the standard ARP monitor into considering the links as
- still up. Use of the arp_validate option can resolve this, as
- the ARP monitor will only consider ARP requests and replies
- associated with its own instance of bonding.
+ filter or 4
+
+ Filtering is applied to all slaves. No validation is
+ performed.
+
+ filter_active or 5
+
+ Filtering is applied to all slaves, validation is performed
+ only for the active slave.
+
+ filter_backup or 6
+
+ Filtering is applied to all slaves, validation is performed
+ only for backup slaves.
+
+ Validation:
+
+ Enabling validation causes the ARP monitor to examine the incoming
+ ARP requests and replies, and only consider a slave to be up if it
+ is receiving the appropriate ARP traffic.
+
+ For an active slave, the validation checks ARP replies to confirm
+ that they were generated by an arp_ip_target. Since backup slaves
+ do not typically receive these replies, the validation performed
+ for backup slaves is on the broadcast ARP request sent out via the
+ active slave. It is possible that some switch or network
+ configurations may result in situations wherein the backup slaves
+ do not receive the ARP requests; in such a situation, validation
+ of backup slaves must be disabled.
+
+ The validation of ARP requests on backup slaves is mainly helping
+ bonding to decide which slaves are more likely to work in case of
+ the active slave failure, it doesn't really guarantee that the
+ backup slave will work if it's selected as the next active slave.
+
+ Validation is useful in network configurations in which multiple
+ bonding hosts are concurrently issuing ARPs to one or more targets
+ beyond a common switch. Should the link between the switch and
+ target fail (but not the switch itself), the probe traffic
+ generated by the multiple bonding instances will fool the standard
+ ARP monitor into considering the links as still up. Use of
+ validation can resolve this, as the ARP monitor will only consider
+ ARP requests and replies associated with its own instance of
+ bonding.
+
+ Filtering:
+
+ Enabling filtering causes the ARP monitor to only use incoming ARP
+ packets for link availability purposes. Arriving packets that are
+ not ARPs are delivered normally, but do not count when determining
+ if a slave is available.
+
+ Filtering operates by only considering the reception of ARP
+ packets (any ARP packet, regardless of source or destination) when
+ determining if a slave has received traffic for link availability
+ purposes.
+
+ Filtering is useful in network configurations in which significant
+ levels of third party broadcast traffic would fool the standard
+ ARP monitor into considering the links as still up. Use of
+ filtering can resolve this, as only ARP traffic is considered for
+ link availability purposes.
This option was added in bonding version 3.1.0.
@@ -549,13 +585,19 @@ mode
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that
- does not require any special switch support. The
- outgoing traffic is distributed according to the
- current load (computed relative to the speed) on each
- slave. Incoming traffic is received by the current
- slave. If the receiving slave fails, another slave
- takes over the MAC address of the failed receiving
- slave.
+ does not require any special switch support.
+
+ In tlb_dynamic_lb=1 mode; the outgoing traffic is
+ distributed according to the current load (computed
+ relative to the speed) on each slave.
+
+ In tlb_dynamic_lb=0 mode; the load balancing based on
+ current load is disabled and the load is distributed
+ only using the hash distribution.
+
+ Incoming traffic is received by the current slave.
+ If the receiving slave fails, another slave takes over
+ the MAC address of the failed receiving slave.
Prerequisite:
@@ -639,6 +681,15 @@ num_unsol_na
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
+packets_per_slave
+
+ Specify the number of packets to transmit through a slave before
+ moving to the next one. When set to 0 then a slave is chosen at
+ random.
+
+ The valid range is 0 - 65535; the default value is 1. This option
+ has effect only in balance-rr mode.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
@@ -648,7 +699,8 @@ primary
one slave is preferred over another, e.g., when one slave has
higher throughput than another.
- The primary option is only valid for active-backup mode.
+ The primary option is only valid for active-backup(1),
+ balance-tlb (5) and balance-alb (6) mode.
primary_reselect
@@ -690,6 +742,28 @@ primary_reselect
This option was added for bonding version 3.6.0.
+tlb_dynamic_lb
+
+ Specifies if dynamic shuffling of flows is enabled in tlb
+ mode. The value has no effect on any other modes.
+
+ The default behavior of tlb mode is to shuffle active flows across
+ slaves based on the load in that interval. This gives nice lb
+ characteristics but can cause packet reordering. If re-ordering is
+ a concern use this variable to disable flow shuffling and rely on
+ load balancing provided solely by the hash distribution.
+ xmit-hash-policy can be used to select the appropriate hashing for
+ the setup.
+
+ The sysfs entry can be used to change the setting per bond device
+ and the initial value is derived from the module parameter. The
+ sysfs entry is allowed to be changed only if the bond device is
+ down.
+
+ The default value is "1" that enables flow shuffling while value "0"
+ disables it. This option was added in bonding driver 3.7.1
+
+
updelay
Specifies the time, in milliseconds, to wait before enabling a
@@ -723,7 +797,7 @@ use_carrier
xmit_hash_policy
Selects the transmit hash policy to use for slave selection in
- balance-xor and 802.3ad modes. Possible values are:
+ balance-xor, 802.3ad, and tlb modes. Possible values are:
layer2
@@ -743,21 +817,16 @@ xmit_hash_policy
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
- generate the hash. The IPv4 formula is
-
- (((source IP XOR dest IP) AND 0xffff) XOR
- ( source MAC XOR destination MAC ))
- modulo slave count
+ generate the hash. The formula is
- The IPv6 formula is
+ hash = source MAC XOR destination MAC
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- hash = (source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4)
-
- (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- XOR (source MAC XOR destination MAC))
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
@@ -779,21 +848,16 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.
- The formula for unfragmented IPv4 TCP and UDP packets is
-
- ((source port XOR dest port) XOR
- ((source IP XOR dest IP) AND 0xffff)
- modulo slave count
+ The formula for unfragmented TCP and UDP packets is
- The formula for unfragmented IPv6 TCP and UDP packets is
+ hash = source port, destination port (as in the header)
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- hash = (source port XOR dest port) XOR
- ((source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4))
-
- ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
@@ -801,10 +865,6 @@ xmit_hash_policy
formula is the same as for the layer2 transmit hash
policy.
- The IPv4 policy is intended to mimic the behavior of
- certain switches, notably Cisco switches with PFC2 as
- well as some Foundry and IBM products.
-
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
@@ -815,6 +875,26 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
+ encap2+3
+
+ This policy uses the same formula as layer2+3 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
+ encap3+4
+
+ This policy uses the same formula as layer3+4 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
@@ -838,6 +918,14 @@ resend_igmp
This option was added for bonding version 3.7.0.
+lp_interval
+
+ Specifies the number of seconds between instances where the bonding
+ driver sends learning packets to each slaves peer switch.
+
+ The valid range is 1 - 0x7fffffff; the default value is 1. This Option
+ has effect only in balance-tlb and balance-alb modes.
+
3. Configuring Bonding Devices
==============================
@@ -1362,6 +1450,12 @@ To add ARP targets:
To remove an ARP target:
# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
+To configure the interval between learning packet transmits:
+# echo 12 > /sys/class/net/bond0/bonding/lp_interval
+ NOTE: the lp_inteval is the number of seconds between instances where
+the bonding driver sends learning packets to each slaves peer switch. The
+default interval is 1 second.
+
Example Configuration
---------------------
We begin with the same example that is shown in section 3.3,