8431d3ab36
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.
#### Why I did it
It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1abef1ecebff bcbca2b74df6 "/usr/local/bin/supe…" 22 hours ago Up 22 hours what-just-happened
3c924d405cd5 docker-lldp:latest "/usr/bin/docker-lld…" 22 hours ago Up 22 hours lldp
eb2b12a98c13 docker-router-advertiser:latest "/usr/bin/docker-ini…" 22 hours ago Up 22 hours radv
d6aac4a46974 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours mgmt-framework
d880fd07aab9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours pmon
75f9e22d4fdd docker-snmp:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours snmp
76d570a4bd1c docker-sonic-telemetry:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours telemetry
ee49f50344b3 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours syncd
1f0b0bab3687 docker-teamd:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours teamd
917aeeaf9722 docker-orchagent:latest "/usr/bin/docker-ini…" 22 hours ago Exited (0) 22 hours ago swss
81a4d3e820e8 docker-fpm-frr:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours bgp
f6eee8be282c docker-database:latest "/usr/local/bin/dock…" 22 hours ago Up 22 hours database
```
The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot:
d01a91a569/files/image_config/misc/docker-wait-any (L56)
#### How I did it
Removed the check for ```"Running"```.
#### How to verify it
Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
109 lines
3.7 KiB
Python
Executable File
109 lines
3.7 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
|
|
"""
|
|
docker-wait-any
|
|
This script takes one or more Docker container names as arguments,
|
|
[-s] argument is for the service which invokes this script
|
|
[-d] argument is to list the dependent services for the above service.
|
|
It will block indefinitely while all of the specified containers
|
|
are running.If any of the specified containers stop, the script will
|
|
exit.
|
|
|
|
This script was created because the 'docker wait' command is lacking
|
|
this functionality. It will block until ALL specified containers have
|
|
stopped running. Here, we spawn multiple threads and wait on one
|
|
container per thread. If any of the threads exit, the entire
|
|
application will exit, unless we are in a scenario where the following
|
|
conditions are met.
|
|
(i) the container is a dependent service
|
|
(ii) warm restart is enabled at system level or for that container OR
|
|
fast reboot is enabled system level
|
|
In this scenario, the g_thread_exit_event won't be propogated to the parent,
|
|
instead the thread will continue to do docker_client.wait again.This help's
|
|
cases where we need the dependent container to be warm-restarted without
|
|
affecting other services (eg: warm restart of teamd service)
|
|
|
|
NOTE: This script is written against docker Python package 4.3.1. Newer
|
|
versions of docker may have a different API.
|
|
"""
|
|
import argparse
|
|
import sys
|
|
import threading
|
|
import time
|
|
|
|
from docker import APIClient
|
|
from sonic_py_common import logger, device_info
|
|
|
|
SYSLOG_IDENTIFIER = 'docker-wait-any'
|
|
|
|
# Global logger instance
|
|
log = logger.Logger(SYSLOG_IDENTIFIER)
|
|
|
|
# Instantiate a global event to share among our threads
|
|
g_thread_exit_event = threading.Event()
|
|
g_service = []
|
|
g_dep_services = []
|
|
|
|
|
|
def wait_for_container(docker_client, container_name):
|
|
log.log_info("Waiting on container '{}'".format(container_name))
|
|
|
|
while True:
|
|
docker_client.wait(container_name)
|
|
|
|
log.log_info("No longer waiting on container '{}'".format(container_name))
|
|
|
|
# If this is a dependent service and warm restart is enabled for the system/container,
|
|
# OR if the system is going through a fast-reboot, DON'T signal main thread to exit
|
|
if (container_name in g_dep_services and
|
|
(device_info.is_warm_restart_enabled(container_name) or device_info.is_fast_reboot_enabled())):
|
|
continue
|
|
|
|
# Signal the main thread to exit
|
|
g_thread_exit_event.set()
|
|
|
|
|
|
def main():
|
|
thread_list = []
|
|
|
|
docker_client = APIClient(base_url='unix://var/run/docker.sock')
|
|
|
|
parser = argparse.ArgumentParser(description='Wait for dependent docker services',
|
|
formatter_class=argparse.RawTextHelpFormatter,
|
|
epilog="""
|
|
Examples:
|
|
docker-wait-any -s swss -d syncd teamd
|
|
""")
|
|
|
|
parser.add_argument('-s', '--service', nargs='+', default=None, help='name of the service')
|
|
parser.add_argument('-d', '--dependent', nargs='*', default=None, help='other dependent services')
|
|
args = parser.parse_args()
|
|
|
|
global g_service
|
|
global g_dep_services
|
|
|
|
if args.service is not None:
|
|
g_service = args.service
|
|
if args.dependent is not None:
|
|
g_dep_services = args.dependent
|
|
|
|
container_names = g_service + g_dep_services
|
|
|
|
# If the service and dependents passed as args is empty, then exit
|
|
if container_names == []:
|
|
sys.exit(0)
|
|
|
|
for container_name in container_names:
|
|
t = threading.Thread(target=wait_for_container, args=[docker_client, container_name])
|
|
t.daemon = True
|
|
t.start()
|
|
thread_list.append(t)
|
|
|
|
# Wait until we receive an event signifying one of the containers has stopped
|
|
g_thread_exit_event.wait()
|
|
sys.exit(0)
|
|
|
|
|
|
if __name__ == '__main__':
|
|
main()
|