From 43340cd58deb7e2a05f1b4e258774db8c1a544de Mon Sep 17 00:00:00 2001 From: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Thu, 31 Aug 2023 21:28:20 +0300 Subject: [PATCH] [memory_checker] Add a specific log message in a case when the docker service is not running. (#16018) #### Why I did it To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129). There could be a scenario before the reboot, where 1. The `docker service` has stopped 2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry` In such scenario, the `memory_checker` script will throw an error to the syslog: ``` ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))' ``` But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog. #### How I did it Change the log severity to the warning and changed the return value. #### How to verify it It is really hard to catch the exact moment described in the `Why I did it` section. In order to check the logic: 1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](https://github.com/sonic-net/sonic-buildimage/blob/47742dfc2c0d1fa27198d69c9183ddc044e11b22/files/image_config/monit/memory_checker#L139) file on the switch. 2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry` 3. Check the syslog for such messages: ``` WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte d.', FileNotFoundError(2, 'No such file or directory'))' INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running! ``` --- files/image_config/monit/memory_checker | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/files/image_config/monit/memory_checker b/files/image_config/monit/memory_checker index 5abe9bbc44..e5bfe4e386 100755 --- a/files/image_config/monit/memory_checker +++ b/files/image_config/monit/memory_checker @@ -140,6 +140,11 @@ def get_running_container_names(): running_container_list = docker_client.containers.list(filters={"status": "running"}) running_container_names = [ container.name for container in running_container_list ] except (docker.errors.APIError, docker.errors.DockerException) as err: + if not is_service_active("docker"): + syslog.syslog(syslog.LOG_INFO, + "[memory_checker] Docker service is not running. Error message is: '{}'".format(err)) + return [] + syslog.syslog(syslog.LOG_ERR, "Failed to retrieve the running container list from docker daemon! Error message is: '{}'" .format(err))