Commit 82f148e9 authored by David S. Miller's avatar David S. Miller

Merge branch 'bonding'

Veaceslav Falico says:

====================
bonding: add an option to rely on unvalidated arp packets

v4 -> v5:
Again per Nik's advise correct the bond_opts restrictions for arp_validate
- set it the same as arp_interval.

v3 -> v4:
Per Nikolay's advise, remove the new bond_opts restriction on modes setting
for arp_validate.

v2 -> v3:
Per Jay's advise, use the 'filter' keyword instead of 'arp' one, and use
his text for documentation. Also, rebase on the latest net-next. Sorry for
the delay, didn't manage to send it before net-next was closed.

v1 -> v2:
Don't remove the 'all traffic' functionality - rather, add new arp_validate
options to specify that we want *only* unvalidated arps.

Currently, if arp_validate is off (0), slave_last_rx() returns the
slave->dev->last_rx, which is always updated on *any* packet received by
slave, and not only arps. This means that, if the validation of arps is
off, we're treating *any* incoming packet as a proof of slave being up, and
not only arps.

This might seem logical at the first glance, however it can cause a lot of
troubles and false-positives, one example would be:

The arp_ip_target is NOT accessible, however someone in the broadcast domain
spams with any broadcast traffic. This way bonding will be tricked that the
slave is still up (as in - can access arp_ip_target), while it's not.

The net_device->last_rx is already used in a lot of drivers (even though the
comment states to NOT do it :)), and it's also ugly to modify it from bonding.

However, some loadbalance setups might rely on the fact that even non-arp
traffic is a sign of slave being up - and we definitely can't break anyones
config - so an extension to arp_validate is needed.

So, to fix this, add an option for the user to specify if he wants to
filter out non-arp traffic on unvalidated slaves, remove the last_rx from
bonding, *always* call bond_arp_rcv() in slave's rx_handler (which is
bond_handle_frame), and if we spot an arp there with this option on - update
the slave->last_arp_rx - and use it instead of net_device->last_rx. Finally,
rename last_arp_rx to last_rx to reflect the changes.

Also rename slave->jiffies to ->last_link_up, to reflect better its
meaning, add the new option's documentation and update the arp_validate one
to be a bit more descriptive.
====================
Signed-off-by: default avatarVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 814ce148 49f17de7
......@@ -270,16 +270,15 @@ arp_ip_target
arp_validate
Specifies whether or not ARP probes and replies should be
validated in the active-backup mode. This causes the ARP
monitor to examine the incoming ARP requests and replies, and
only consider a slave to be up if it is receiving the
appropriate ARP traffic.
validated in any mode that supports arp monitoring, or whether
non-ARP traffic should be filtered (disregarded) for link
monitoring purposes.
Possible values are:
none or 0
No validation is performed. This is the default.
No validation or filtering is performed.
active or 1
......@@ -293,31 +292,68 @@ arp_validate
Validation is performed for all slaves.
For the active slave, the validation checks ARP replies to
confirm that they were generated by an arp_ip_target. Since
backup slaves do not typically receive these replies, the
validation performed for backup slaves is on the ARP request
sent out via the active slave. It is possible that some
switch or network configurations may result in situations
wherein the backup slaves do not receive the ARP requests; in
such a situation, validation of backup slaves must be
disabled.
The validation of ARP requests on backup slaves is mainly
helping bonding to decide which slaves are more likely to
work in case of the active slave failure, it doesn't really
guarantee that the backup slave will work if it's selected
as the next active slave.
This option is useful in network configurations in which
multiple bonding hosts are concurrently issuing ARPs to one or
more targets beyond a common switch. Should the link between
the switch and target fail (but not the switch itself), the
probe traffic generated by the multiple bonding instances will
fool the standard ARP monitor into considering the links as
still up. Use of the arp_validate option can resolve this, as
the ARP monitor will only consider ARP requests and replies
associated with its own instance of bonding.
filter or 4
Filtering is applied to all slaves. No validation is
performed.
filter_active or 5
Filtering is applied to all slaves, validation is performed
only for the active slave.
filter_backup or 6
Filtering is applied to all slaves, validation is performed
only for backup slaves.
Validation:
Enabling validation causes the ARP monitor to examine the incoming
ARP requests and replies, and only consider a slave to be up if it
is receiving the appropriate ARP traffic.
For an active slave, the validation checks ARP replies to confirm
that they were generated by an arp_ip_target. Since backup slaves
do not typically receive these replies, the validation performed
for backup slaves is on the broadcast ARP request sent out via the
active slave. It is possible that some switch or network
configurations may result in situations wherein the backup slaves
do not receive the ARP requests; in such a situation, validation
of backup slaves must be disabled.
The validation of ARP requests on backup slaves is mainly helping
bonding to decide which slaves are more likely to work in case of
the active slave failure, it doesn't really guarantee that the
backup slave will work if it's selected as the next active slave.
Validation is useful in network configurations in which multiple
bonding hosts are concurrently issuing ARPs to one or more targets
beyond a common switch. Should the link between the switch and
target fail (but not the switch itself), the probe traffic
generated by the multiple bonding instances will fool the standard
ARP monitor into considering the links as still up. Use of
validation can resolve this, as the ARP monitor will only consider
ARP requests and replies associated with its own instance of
bonding.
Filtering:
Enabling filtering causes the ARP monitor to only use incoming ARP
packets for link availability purposes. Arriving packets that are
not ARPs are delivered normally, but do not count when determining
if a slave is available.
Filtering operates by only considering the reception of ARP
packets (any ARP packet, regardless of source or destination) when
determining if a slave has received traffic for link availability
purposes.
Filtering is useful in network configurations in which significant
levels of third party broadcast traffic would fool the standard
ARP monitor into considering the links as still up. Use of
filtering can resolve this, as only ARP traffic is considered for
link availability purposes.
This option was added in bonding version 3.1.0.
......
......@@ -798,7 +798,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
return;
if (new_active) {
new_active->jiffies = jiffies;
new_active->last_link_up = jiffies;
if (new_active->link == BOND_LINK_BACK) {
if (USES_PRIMARY(bond->params.mode)) {
......@@ -1115,9 +1115,6 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
slave = bond_slave_get_rcu(skb->dev);
bond = slave->bond;
if (bond->params.arp_interval)
slave->dev->last_rx = jiffies;
recv_probe = ACCESS_ONCE(bond->recv_probe);
if (recv_probe) {
ret = recv_probe(skb, bond, slave);
......@@ -1400,10 +1397,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
bond_update_speed_duplex(new_slave);
new_slave->last_arp_rx = jiffies -
new_slave->last_rx = jiffies -
(msecs_to_jiffies(bond->params.arp_interval) + 1);
for (i = 0; i < BOND_MAX_ARP_TARGETS; i++)
new_slave->target_last_arp_rx[i] = new_slave->last_arp_rx;
new_slave->target_last_arp_rx[i] = new_slave->last_rx;
if (bond->params.miimon && !bond->params.use_carrier) {
link_reporting = bond_check_dev_link(bond, slave_dev, 1);
......@@ -1447,7 +1444,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
}
if (new_slave->link != BOND_LINK_DOWN)
new_slave->jiffies = jiffies;
new_slave->last_link_up = jiffies;
pr_debug("Initial state of slave_dev is BOND_LINK_%s\n",
new_slave->link == BOND_LINK_DOWN ? "DOWN" :
(new_slave->link == BOND_LINK_UP ? "UP" : "BACK"));
......@@ -1894,7 +1891,7 @@ static int bond_miimon_inspect(struct bonding *bond)
* recovered before downdelay expired
*/
slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
slave->last_link_up = jiffies;
pr_info("%s: link status up again after %d ms for interface %s\n",
bond->dev->name,
(bond->params.downdelay - slave->delay) *
......@@ -1969,7 +1966,7 @@ static void bond_miimon_commit(struct bonding *bond)
case BOND_LINK_UP:
slave->link = BOND_LINK_UP;
slave->jiffies = jiffies;
slave->last_link_up = jiffies;
if (bond->params.mode == BOND_MODE_8023AD) {
/* prevent it from being the active one */
......@@ -2245,7 +2242,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
pr_debug("bva: sip %pI4 not found in targets\n", &sip);
return;
}
slave->last_arp_rx = jiffies;
slave->last_rx = jiffies;
slave->target_last_arp_rx[i] = jiffies;
}
......@@ -2255,15 +2252,16 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
struct arphdr *arp = (struct arphdr *)skb->data;
unsigned char *arp_ptr;
__be32 sip, tip;
int alen;
int alen, is_arp = skb->protocol == __cpu_to_be16(ETH_P_ARP);
if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
if (!slave_do_arp_validate(bond, slave)) {
if ((slave_do_arp_validate_only(bond, slave) && is_arp) ||
!slave_do_arp_validate_only(bond, slave))
slave->last_rx = jiffies;
return RX_HANDLER_ANOTHER;
read_lock(&bond->lock);
if (!slave_do_arp_validate(bond, slave))
goto out_unlock;
} else if (!is_arp) {
return RX_HANDLER_ANOTHER;
}
alen = arp_hdr_len(bond->dev);
......@@ -2314,11 +2312,10 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
bond_validate_arp(bond, slave, sip, tip);
else if (bond->curr_active_slave &&
time_after(slave_last_rx(bond, bond->curr_active_slave),
bond->curr_active_slave->jiffies))
bond->curr_active_slave->last_link_up))
bond_validate_arp(bond, slave, tip, sip);
out_unlock:
read_unlock(&bond->lock);
if (arp != (struct arphdr *)skb->data)
kfree(arp);
return RX_HANDLER_ANOTHER;
......@@ -2361,9 +2358,9 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
oldcurrent = ACCESS_ONCE(bond->curr_active_slave);
/* see if any of the previous devices are up now (i.e. they have
* xmt and rcv traffic). the curr_active_slave does not come into
* the picture unless it is null. also, slave->jiffies is not needed
* here because we send an arp on each slave and give a slave as
* long as it needs to get the tx/rx within the delta.
* the picture unless it is null. also, slave->last_link_up is not
* needed here because we send an arp on each slave and give a slave
* as long as it needs to get the tx/rx within the delta.
* TODO: what about up/down delay in arp mode? it wasn't here before
* so it can wait
*/
......@@ -2372,7 +2369,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
if (slave->link != BOND_LINK_UP) {
if (bond_time_in_interval(bond, trans_start, 1) &&
bond_time_in_interval(bond, slave->dev->last_rx, 1)) {
bond_time_in_interval(bond, slave->last_rx, 1)) {
slave->link = BOND_LINK_UP;
slave_state_changed = 1;
......@@ -2401,7 +2398,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
* if we don't know our ip yet
*/
if (!bond_time_in_interval(bond, trans_start, 2) ||
!bond_time_in_interval(bond, slave->dev->last_rx, 2)) {
!bond_time_in_interval(bond, slave->last_rx, 2)) {
slave->link = BOND_LINK_DOWN;
slave_state_changed = 1;
......@@ -2489,7 +2486,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
* active. This avoids bouncing, as the last receive
* times need a full ARP monitor cycle to be updated.
*/
if (bond_time_in_interval(bond, slave->jiffies, 2))
if (bond_time_in_interval(bond, slave->last_link_up, 2))
continue;
/*
......@@ -2690,7 +2687,7 @@ static bool bond_ab_arp_probe(struct bonding *bond)
new_slave->link = BOND_LINK_BACK;
bond_set_slave_active_flags(new_slave);
bond_arp_send_all(bond, new_slave);
new_slave->jiffies = jiffies;
new_slave->last_link_up = jiffies;
rcu_assign_pointer(bond->current_arp_slave, new_slave);
rtnl_unlock();
......@@ -3060,7 +3057,6 @@ static int bond_open(struct net_device *bond_dev)
if (bond->params.arp_interval) { /* arp interval, in milliseconds. */
queue_delayed_work(bond->wq, &bond->arp_work, 0);
if (bond->params.arp_validate)
bond->recv_probe = bond_arp_rcv;
}
......@@ -4186,10 +4182,6 @@ static int bond_check_params(struct bond_params *params)
}
if (arp_validate) {
if (bond_mode != BOND_MODE_ACTIVEBACKUP) {
pr_err("arp_validate only supported in active-backup mode\n");
return -EINVAL;
}
if (!arp_interval) {
pr_err("arp_validate requires arp_interval\n");
return -EINVAL;
......
......@@ -51,6 +51,9 @@ static struct bond_opt_value bond_arp_validate_tbl[] = {
{ "active", BOND_ARP_VALIDATE_ACTIVE, 0},
{ "backup", BOND_ARP_VALIDATE_BACKUP, 0},
{ "all", BOND_ARP_VALIDATE_ALL, 0},
{ "filter", BOND_ARP_FILTER, 0},
{ "filter_active", BOND_ARP_FILTER_ACTIVE, 0},
{ "filter_backup", BOND_ARP_FILTER_BACKUP, 0},
{ NULL, -1, 0},
};
......@@ -151,7 +154,8 @@ static struct bond_option bond_opts[] = {
.id = BOND_OPT_ARP_VALIDATE,
.name = "arp_validate",
.desc = "validate src/dst of ARP probes",
.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_ACTIVEBACKUP)),
.unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) |
BIT(BOND_MODE_ALB),
.values = bond_arp_validate_tbl,
.set = bond_option_arp_validate_set
},
......@@ -809,7 +813,6 @@ int bond_option_arp_interval_set(struct bonding *bond,
cancel_delayed_work_sync(&bond->arp_work);
} else {
/* arp_validate can be set only in active-backup mode */
if (bond->params.arp_validate)
bond->recv_probe = bond_arp_rcv;
cancel_delayed_work_sync(&bond->mii_work);
queue_delayed_work(bond->wq, &bond->arp_work, 0);
......
......@@ -188,8 +188,9 @@ struct slave {
struct net_device *dev; /* first - useful for panic debug */
struct bonding *bond; /* our master */
int delay;
unsigned long jiffies;
unsigned long last_arp_rx;
/* all three in jiffies */
unsigned long last_link_up;
unsigned long last_rx;
unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
s8 link; /* one of BOND_LINK_XXXX */
s8 new_link;
......@@ -342,6 +343,11 @@ static inline bool bond_is_active_slave(struct slave *slave)
#define BOND_ARP_VALIDATE_BACKUP (1 << BOND_STATE_BACKUP)
#define BOND_ARP_VALIDATE_ALL (BOND_ARP_VALIDATE_ACTIVE | \
BOND_ARP_VALIDATE_BACKUP)
#define BOND_ARP_FILTER (BOND_ARP_VALIDATE_ALL + 1)
#define BOND_ARP_FILTER_ACTIVE (BOND_ARP_VALIDATE_ACTIVE | \
BOND_ARP_FILTER)
#define BOND_ARP_FILTER_BACKUP (BOND_ARP_VALIDATE_BACKUP | \
BOND_ARP_FILTER)
static inline int slave_do_arp_validate(struct bonding *bond,
struct slave *slave)
......@@ -349,6 +355,12 @@ static inline int slave_do_arp_validate(struct bonding *bond,
return bond->params.arp_validate & (1 << bond_slave_state(slave));
}
static inline int slave_do_arp_validate_only(struct bonding *bond,
struct slave *slave)
{
return bond->params.arp_validate & BOND_ARP_FILTER;
}
/* Get the oldest arp which we've received on this slave for bond's
* arp_targets.
*/
......@@ -368,14 +380,10 @@ static inline unsigned long slave_oldest_target_arp_rx(struct bonding *bond,
static inline unsigned long slave_last_rx(struct bonding *bond,
struct slave *slave)
{
if (slave_do_arp_validate(bond, slave)) {
if (bond->params.arp_all_targets == BOND_ARP_TARGETS_ALL)
return slave_oldest_target_arp_rx(bond, slave);
else
return slave->last_arp_rx;
}
return slave->dev->last_rx;
return slave->last_rx;
}
#ifdef CONFIG_NET_POLL_CONTROLLER
......
......@@ -1312,13 +1312,7 @@ struct net_device {
/*
* Cache lines mostly used on receive path (including eth_type_trans())
*/
unsigned long last_rx; /* Time of last Rx
* This should not be set in
* drivers, unless really needed,
* because network stack (bonding)
* use it if/when necessary, to
* avoid dirtying this cache line.
*/
unsigned long last_rx; /* Time of last Rx */
/* Interface address info used in eth_type_trans() */
unsigned char *dev_addr; /* hw address, (before bcast
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment