[docker-teamd]: Add teamd as a depedent service to swss (#5628)
**- Why I did it** On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. **- How I did it** Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service **- How to verify it** Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( **confirmed swss don't exit here** ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
This commit is contained in:
parent
af97e23686
commit
ace7f24cba
@ -3,50 +3,95 @@
|
|||||||
"""
|
"""
|
||||||
docker-wait-any
|
docker-wait-any
|
||||||
This script takes one or more Docker container names as arguments,
|
This script takes one or more Docker container names as arguments,
|
||||||
and it will block indefinitely while all of the specified containers
|
[-s] argument is for the service which invokes this script
|
||||||
are running. If any of the specified containers stop, the script will
|
[-d] argument is to list the dependent services for the above service.
|
||||||
|
It will block indefinitely while all of the specified containers
|
||||||
|
are running.If any of the specified containers stop, the script will
|
||||||
exit.
|
exit.
|
||||||
|
|
||||||
This script was created because the 'docker wait' command is lacking
|
This script was created because the 'docker wait' command is lacking
|
||||||
this functionality. It will block until ALL specified containers have
|
this functionality. It will block until ALL specified containers have
|
||||||
stopped running. Here, we spawn multiple threads and wait on one
|
stopped running. Here, we spawn multiple threads and wait on one
|
||||||
container per thread. If any of the threads exit, the entire
|
container per thread. If any of the threads exit, the entire
|
||||||
application will exit.
|
application will exit, unless we are in a scenario where the following
|
||||||
|
conditions are met.
|
||||||
|
(i) the container is a dependent service
|
||||||
|
(ii) warm restart is enabled at system level or for that container OR
|
||||||
|
fast reboot is enabled system level
|
||||||
|
In this scenario, the g_thread_exit_event won't be propogated to the parent,
|
||||||
|
instead the thread will continue to do docker_client.wait again.This help's
|
||||||
|
cases where we need the dependent container to be warm-restarted without
|
||||||
|
affecting other services (eg: warm restart of teamd service)
|
||||||
|
|
||||||
NOTE: This script is written against docker Python package 4.1.0. Newer
|
NOTE: This script is written against docker Python package 4.1.0. Newer
|
||||||
versions of docker may have a different API.
|
versions of docker may have a different API.
|
||||||
"""
|
"""
|
||||||
|
import argparse
|
||||||
import sys
|
import sys
|
||||||
import threading
|
import threading
|
||||||
|
import time
|
||||||
|
|
||||||
from docker import APIClient
|
from docker import APIClient
|
||||||
|
from sonic_py_common import logger, device_info
|
||||||
|
|
||||||
|
SYSLOG_IDENTIFIER = 'docker-wait-any'
|
||||||
|
|
||||||
|
# Global logger instance
|
||||||
|
log = logger.Logger(SYSLOG_IDENTIFIER)
|
||||||
|
|
||||||
# Instantiate a global event to share among our threads
|
# Instantiate a global event to share among our threads
|
||||||
g_thread_exit_event = threading.Event()
|
g_thread_exit_event = threading.Event()
|
||||||
|
g_service = []
|
||||||
|
g_dep_services = []
|
||||||
def usage():
|
|
||||||
print("Usage: {} <container_name> [<container_name> ...]".format(sys.argv[0]))
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
|
|
||||||
def wait_for_container(docker_client, container_name):
|
def wait_for_container(docker_client, container_name):
|
||||||
docker_client.wait(container_name)
|
while True:
|
||||||
|
while docker_client.inspect_container(container_name)['State']['Status'] != "running":
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
print("No longer waiting on container '{}'".format(container_name))
|
docker_client.wait(container_name)
|
||||||
|
|
||||||
# Signal the main thread to exit
|
log.log_info("No longer waiting on container '{}'".format(container_name))
|
||||||
g_thread_exit_event.set()
|
|
||||||
|
|
||||||
|
# If this is a dependent service and warm restart is enabled for the system/container,
|
||||||
|
# OR if the system is going through a fast-reboot, DON'T signal main thread to exit
|
||||||
|
if (container_name in g_dep_services and
|
||||||
|
(device_info.is_warm_restart_enabled(container_name) or device_info.is_fast_reboot_enabled())):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Signal the main thread to exit
|
||||||
|
g_thread_exit_event.set()
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
thread_list = []
|
thread_list = []
|
||||||
|
|
||||||
docker_client = APIClient(base_url='unix://var/run/docker.sock')
|
docker_client = APIClient(base_url='unix://var/run/docker.sock')
|
||||||
|
|
||||||
# Ensure we were passed at least one argument
|
parser = argparse.ArgumentParser(description='Wait for dependent docker services',
|
||||||
if len(sys.argv) < 2:
|
version='1.0.0',
|
||||||
usage()
|
formatter_class=argparse.RawTextHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
docker-wait-any -s swss -d syncd teamd
|
||||||
|
""")
|
||||||
|
|
||||||
container_names = sys.argv[1:]
|
parser.add_argument('-s','--service', nargs='+', default=None, help='name of the service')
|
||||||
|
parser.add_argument('-d','--dependent', nargs='*', default=None, help='other dependent services')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
global g_service
|
||||||
|
global g_dep_services
|
||||||
|
|
||||||
|
if args.service is not None:
|
||||||
|
g_service = args.service
|
||||||
|
if args.dependent is not None:
|
||||||
|
g_dep_services = args.dependent
|
||||||
|
|
||||||
|
container_names = g_service + g_dep_services
|
||||||
|
|
||||||
|
# If the service and dependents passed as args is empty, then exit
|
||||||
|
if container_names == []:
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
for container_name in container_names:
|
for container_name in container_names:
|
||||||
t = threading.Thread(target=wait_for_container, args=[docker_client, container_name])
|
t = threading.Thread(target=wait_for_container, args=[docker_client, container_name])
|
||||||
|
@ -170,7 +170,20 @@ wait() {
|
|||||||
else
|
else
|
||||||
RUNNING=$(docker inspect -f '{{.State.Running}}' ${PEER})
|
RUNNING=$(docker inspect -f '{{.State.Running}}' ${PEER})
|
||||||
fi
|
fi
|
||||||
if [[ x"$RUNNING" == x"true" ]]; then
|
ALL_DEPS_RUNNING=true
|
||||||
|
for dep in ${MULTI_INST_DEPENDENT}; do
|
||||||
|
if [[ ! -z $DEV ]]; then
|
||||||
|
DEP_RUNNING=$(docker inspect -f '{{.State.Running}}' ${dep}$DEV)
|
||||||
|
else
|
||||||
|
DEP_RUNNING=$(docker inspect -f '{{.State.Running}}' ${dep})
|
||||||
|
fi
|
||||||
|
if [[ x"$DEP_RUNNING" != x"true" ]]; then
|
||||||
|
ALL_DEPS_RUNNING=false
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ x"$RUNNING" == x"true" && x"$ALL_DEPS_RUNNING" == x"true" ]]; then
|
||||||
break
|
break
|
||||||
else
|
else
|
||||||
sleep 1
|
sleep 1
|
||||||
@ -179,10 +192,18 @@ wait() {
|
|||||||
|
|
||||||
# NOTE: This assumes Docker containers share the same names as their
|
# NOTE: This assumes Docker containers share the same names as their
|
||||||
# corresponding services
|
# corresponding services
|
||||||
|
for dep in ${MULTI_INST_DEPENDENT}; do
|
||||||
|
if [[ ! -z $DEV ]]; then
|
||||||
|
ALL_DEPS="$ALL_DEPS ${dep}$DEV"
|
||||||
|
else
|
||||||
|
ALL_DEPS="$ALL_DEPS ${dep}"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
if [[ ! -z $DEV ]]; then
|
if [[ ! -z $DEV ]]; then
|
||||||
/usr/bin/docker-wait-any ${SERVICE}$DEV ${PEER}$DEV
|
/usr/bin/docker-wait-any -s ${SERVICE}$DEV -d ${PEER}$DEV ${ALL_DEPS}
|
||||||
else
|
else
|
||||||
/usr/bin/docker-wait-any ${SERVICE} ${PEER}
|
/usr/bin/docker-wait-any -s ${SERVICE} -d ${PEER} ${ALL_DEPS}
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ import yaml
|
|||||||
from natsort import natsorted
|
from natsort import natsorted
|
||||||
|
|
||||||
# TODO: Replace with swsscommon
|
# TODO: Replace with swsscommon
|
||||||
from swsssdk import ConfigDBConnector, SonicDBConfig
|
from swsssdk import ConfigDBConnector, SonicDBConfig, SonicV2Connector
|
||||||
|
|
||||||
USR_SHARE_SONIC_PATH = "/usr/share/sonic"
|
USR_SHARE_SONIC_PATH = "/usr/share/sonic"
|
||||||
HOST_DEVICE_PATH = USR_SHARE_SONIC_PATH + "/device"
|
HOST_DEVICE_PATH = USR_SHARE_SONIC_PATH + "/device"
|
||||||
@ -442,3 +442,38 @@ def get_system_routing_stack():
|
|||||||
raise OSError("Cannot detect routing stack")
|
raise OSError("Cannot detect routing stack")
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
# Check if System warm reboot or Container warm restart is enabled.
|
||||||
|
def is_warm_restart_enabled(container_name):
|
||||||
|
state_db = SonicV2Connector(host='127.0.0.1')
|
||||||
|
state_db.connect(state_db.STATE_DB, False)
|
||||||
|
|
||||||
|
TABLE_NAME_SEPARATOR = '|'
|
||||||
|
prefix = 'WARM_RESTART_ENABLE_TABLE' + TABLE_NAME_SEPARATOR
|
||||||
|
|
||||||
|
# Get the system warm reboot enable state
|
||||||
|
_hash = '{}{}'.format(prefix, 'system')
|
||||||
|
wr_system_state = state_db.get(state_db.STATE_DB, _hash, "enable")
|
||||||
|
wr_enable_state = True if wr_system_state == "true" else False
|
||||||
|
|
||||||
|
# Get the container warm reboot enable state
|
||||||
|
_hash = '{}{}'.format(prefix, container_name)
|
||||||
|
wr_container_state = state_db.get(state_db.STATE_DB, _hash, "enable")
|
||||||
|
wr_enable_state |= True if wr_container_state == "true" else False
|
||||||
|
|
||||||
|
state_db.close(state_db.STATE_DB)
|
||||||
|
return wr_enable_state
|
||||||
|
|
||||||
|
# Check if System fast reboot is enabled.
|
||||||
|
def is_fast_reboot_enabled():
|
||||||
|
fb_system_state = 0
|
||||||
|
cmd = 'sonic-db-cli STATE_DB get "FAST_REBOOT|system"'
|
||||||
|
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
|
||||||
|
(stdout, stderr) = proc.communicate()
|
||||||
|
|
||||||
|
if proc.returncode != 0:
|
||||||
|
log.log_error("Error running command '{}'".format(cmd))
|
||||||
|
elif stdout:
|
||||||
|
fb_system_state = stdout.rstrip('\n')
|
||||||
|
|
||||||
|
return fb_system_state
|
||||||
|
Loading…
Reference in New Issue
Block a user