chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
This commit is contained in:
anamehra 2023-09-01 11:41:46 -07:00 committed by GitHub
parent 566b5dfa1f
commit f6897bb585
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -25,29 +25,35 @@ while /bin/true; do
for i in ${!STATIC_ROUTE_NEXTHOPS[@]}; do
nexthop="${STATIC_ROUTE_NEXTHOPS[i]}"
if [[ $nexthop == *"."* ]]; then
neigh_state=( $(ip -4 neigh show | grep -w $nexthop | tr -s ' ' | cut -d ' ' -f 3,4) )
neigh_state=$(ip -4 neigh show | grep -w $nexthop | tr -s ' ')
ping_prefix=ping
elif [[ $nexthop == *":"* ]] ; then
neigh_state=( $(ip -6 neigh show | grep -w $nexthop | tr -s ' ' | cut -d ' ' -f 3,4) )
neigh_state=$(ip -6 neigh show | grep -w $nexthop | tr -s ' ')
ping_prefix=ping6
fi
if [[ -z "${neigh_state}" ]] || [[ "${neigh_state[1]}" == "INCOMPLETE" ]] || [[ "${neigh_state[1]}" == "FAILED" ]]; then
# Check if there is an INCOMPLETE, FAILED, or STALE entry and try to resolve it again.
# STALE entries may be present if there is no traffic on a path. A far-end down event may not
# clear the STALE entry. Refresh the STALE entry to clear the table.
if [[ -z "${neigh_state}" ]] || [[ -n $(echo ${neigh_state} | grep 'INCOMPLETE\|FAILED\|STALE') ]]; then
interface="${STATIC_ROUTE_IFNAMES[i]}"
if [[ -z "$interface" ]]; then
# should never be here, handling just in case
logger "ERR: arp_update: missing interface entry for static route $nexthop"
interface=${neigh_state[0]}
continue
fi
intf_up=$(ip link show $interface | grep "state UP")
if [[ -n "$intf_up" ]]; then
pingcmd="timeout 0.2 $ping_prefix -I ${interface} -n -q -i 0 -c 1 -W 1 $nexthop >/dev/null"
eval $pingcmd
logger "arp_update: static route nexthop not resolved, pinging $nexthop on ${neigh_state[0]}"
# STALE entries may appear more often, not logging to prevent periodic syslogs
if [[ -z $(echo ${neigh_state} | grep 'STALE') ]]; then
logger "arp_update: static route nexthop not resolved ($neigh_state), pinging $nexthop on $interface"
fi
fi
fi
done
sleep 300
sleep 150
continue
fi
# find L3 interfaces which are UP, send ipv6 multicast pings