sonic-buildimage/platform/broadcom
yozhao101 be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
..
docker-saiserver-brcm Upgrade syncd to buster. (#6106) 2020-12-17 12:46:45 -08:00
docker-syncd-brcm [supervisord] Monitoring the critical processes with supervisord. (#6242) 2021-01-21 12:57:49 -08:00
docker-syncd-brcm-rpc [supervisord]: use abspath as supervisord entrypoint (#5995) 2020-11-22 21:18:44 -08:00
saibcm-modules [BCMSAI] Update BCM SAI debian package to 4.2.1.3 (6.5.19 hsdk) (#5532) 2020-10-06 07:58:00 -07:00
sonic-platform-modules-accton [device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168) 2021-01-15 08:06:21 -08:00
sonic-platform-modules-alphanetworks [kernel]: upgrade linux kernel to 4.9.118 (#4897) 2020-07-12 18:08:51 +00:00
sonic-platform-modules-arista@937ea8abc1 [Arista] Update driver submodules (#6396) 2021-01-10 07:42:56 -08:00
sonic-platform-modules-brcm-xlr-gts [build]: fix dpkg admindir corruption issue in parallel build (#6408) 2021-01-12 06:03:12 -08:00
sonic-platform-modules-cel [device/celestica]: Add thermalctld support on DX010 platform APIs (#6089) 2021-01-15 10:20:47 -08:00
sonic-platform-modules-dell DellEMC: Z9332f change SFP detection logic (#6261) 2021-01-06 11:12:04 -08:00
sonic-platform-modules-delta [kernel]: upgrade linux kernel to 4.9.118 (#4897) 2020-07-12 18:08:51 +00:00
sonic-platform-modules-ingrasys [platform-modules]: fix compile issues for platform driver under 4.19 2020-04-17 04:51:51 +00:00
sonic-platform-modules-inventec [sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926) 2020-11-25 10:28:36 -08:00
sonic-platform-modules-juniper [Juniper] Platform bug fixes / improvements (#5541) 2020-11-10 22:13:23 -08:00
sonic-platform-modules-mitac [kernel]: upgrade linux kernel to 4.9.118 (#4897) 2020-07-12 18:08:51 +00:00
sonic-platform-modules-quanta [platform-modules]: fix compile issues for platform driver under 4.19 2020-04-17 04:51:51 +00:00
docker-ptf-brcm.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
docker-ptf-brcm.mk [build]: add docker-ptf-* as stretch docker targets (#4516) 2020-05-01 11:20:33 -07:00
docker-saiserver-brcm.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
docker-saiserver-brcm.mk Upgrade syncd to buster. (#6106) 2020-12-17 12:46:45 -08:00
docker-syncd-brcm-rpc.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
docker-syncd-brcm-rpc.mk Upgrade syncd to buster. (#6106) 2020-12-17 12:46:45 -08:00
docker-syncd-brcm.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
docker-syncd-brcm.mk Upgrade syncd to buster. (#6106) 2020-12-17 12:46:45 -08:00
libsaithrift-dev.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
libsaithrift-dev.mk [sai and sairedis] advance sairedis sub-module and upgrade to matching Broadcom SAI build (#2488) 2019-02-16 10:14:18 -08:00
one-aboot.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
one-aboot.mk [arista] update platform driver submodules (#4512) 2020-04-30 12:06:19 -07:00
one-image.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
one-image.mk Platform Driver Developement Framework (PDDF) (#4756) 2020-11-12 10:22:38 -08:00
one-pde-image.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-accton.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-accton.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-alphanetworks.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-alphanetworks.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-arista.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-arista.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-brcm-xlr-gts.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-brcm-xlr-gts.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-cel.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-cel.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-dell.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-dell.mk [devices]: DellEMC new platform support for DellEMC s5296f- 96x25G (#3960) 2020-10-21 11:10:50 -07:00
platform-modules-delta.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-delta.mk [platform/delta]: Add a new supported platform, Delta-agc032 (#4602) 2020-05-27 09:33:02 -07:00
platform-modules-ingrasys.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-ingrasys.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-inventec.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-inventec.mk [Inventec] Add support for D6332 platform (#5304) 2020-10-20 11:37:16 -07:00
platform-modules-juniper.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-juniper.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-mitac.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-mitac.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-quanta.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform-modules-quanta.mk [build]: add buster docker as the last step of the build proces 2020-04-16 10:26:18 +00:00
platform-modules-s6000.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
platform.conf one image implementation (#215) 2017-01-29 11:33:33 -08:00
raw-image.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
raw-image.mk [build]: Move Systemd service start to systemd generator (#3172) 2019-07-29 15:52:15 -07:00
rules.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
rules.mk [build]: wait for conflicts package to be uninstalled (#5039) 2020-07-27 10:46:20 -07:00
sai-modules.dep [build]: support for DPKG local caching (#4117) 2020-03-11 20:04:52 -07:00
sai-modules.mk [BCMSAI] Update BCM SAI debian package to 4.2.1.3 (6.5.19 hsdk) (#5532) 2020-10-06 07:58:00 -07:00
sai.dep Fix docker images rebuilt issue when building each host image (#5925) 2020-11-24 21:45:06 +08:00
sai.mk Anchor the libprotobuf-dev version based on a fixed version by using debian control dependency (#6420) 2021-01-12 09:51:15 -08:00