sonic-buildimage/platform/mellanox
yozhao101 cc9c3f567e [supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-28 09:28:27 -08:00
..
docker-saiserver-mlnx [supervisord]: use abspath as supervisord entrypoint (#5995) 2020-11-22 21:18:44 -08:00
docker-syncd-mlnx [supervisord] Monitoring the critical processes with supervisord. (#6242) 2021-01-28 09:28:27 -08:00
docker-syncd-mlnx-rpc [supervisord]: use abspath as supervisord entrypoint (#5995) 2020-11-22 21:18:44 -08:00
hw-management Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550) 2021-01-28 09:22:52 -08:00
issu-version [mellanox|ffb] ISSU version check (#2437) 2019-01-17 14:41:32 -08:00
mft [Mellanox] Add MFT DKMS build support. (#5088) 2020-08-03 13:52:40 +03:00
mlnx-platform-api [mellanox][platform api] fix a missing import time module (#6458) 2021-01-15 08:20:57 -08:00
mlnx-sai [mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566) 2021-01-28 09:25:05 -08:00
sdk-src [mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552) 2021-01-28 09:21:56 -08:00
.gitignore [mellanox] build SDK driver from open source (#3580) 2019-10-08 07:57:12 -07:00
asic_table.j2 [Dynamic buffer calc] Support dynamic buffer calculation (#6194) 2020-12-13 11:35:39 -08:00
docker-saiserver-mlnx.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
docker-saiserver-mlnx.mk [build]: add docker-saiserver-* as stretch docker targets 2020-05-06 10:23:38 +00:00
docker-syncd-mlnx-rpc.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
docker-syncd-mlnx-rpc.mk [dockers] update mellanox syncd and pmon to buster (#4818) 2020-07-18 03:46:15 -07:00
docker-syncd-mlnx.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
docker-syncd-mlnx.mk [dockers] update mellanox syncd and pmon to buster (#4818) 2020-07-18 03:46:15 -07:00
fw.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
fw.mk [mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552) 2021-01-28 09:21:56 -08:00
hw-management.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
hw-management.mk [Mellanox] update hw-mgmt package to V.7.0010.1300 (#5902) 2020-11-16 01:57:19 -08:00
issu-version.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
issu-version.mk [mellanox]: Add SSD FW update tool (#4351) 2020-04-13 18:13:19 +03:00
libsaithrift-dev.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
libsaithrift-dev.mk [sai and sairedis] advance sairedis sub-module and upgrade to matching Broadcom SAI build (#2488) 2019-02-16 10:14:18 -08:00
mft.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mft.mk update mft tool to 4.15.3 (#6281) 2020-12-27 11:19:13 +02:00
mlnx-ffb.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ffb.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ffb.sh Update all references to new 'sonic-installer' name (#5119) 2020-08-07 08:49:39 -07:00
mlnx-fw-upgrade.j2 [mellanox] Use 'mlxfwmanager -l' for extracting available firmware version from FW images (#5915) 2020-12-01 18:15:28 +02:00
mlnx-onie-fw-update.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-onie-fw-update.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-onie-fw-update.sh [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
mlnx-platform-api.dep [Mellanox] Add python3 support for Mellanox platform API (#6175) 2020-12-11 10:51:31 -08:00
mlnx-platform-api.mk [Mellanox] Add python3 support for Mellanox platform API (#6175) 2020-12-11 10:51:31 -08:00
mlnx-sai.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-sai.mk [mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566) 2021-01-28 09:25:05 -08:00
mlnx-ssd-fw-update.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ssd-fw-update.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ssd-fw-update.sh [Mellanox] Add ONIE and SSD platform components. (#4758) 2020-06-15 14:25:49 +03:00
one-image.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
one-image.mk [mellanox]: Add SSD FW update tool (#4351) 2020-04-13 18:13:19 +03:00
peripheral_table.j2 [Dynamic buffer calc] Support dynamic buffer calculation (#6194) 2020-12-13 11:35:39 -08:00
platform.conf one image implementation (#215) 2017-01-29 11:33:33 -08:00
rules.dep [docker-ptf]: build docker ptf 2021-01-28 09:23:12 -08:00
rules.mk [docker-ptf]: build docker ptf 2021-01-28 09:23:12 -08:00
sdk.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
sdk.mk [mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552) 2021-01-28 09:21:56 -08:00