Commit Graph

5 Commits

Author SHA1 Message Date
Sujin Kang
ae7fa32691
[pmon]: Enable Autorestart of the daemons in PMON for unexpected exit (#8358)
Enable Autorestart of the daemons in PMON for unexpected exit
Remove the daemon list from the critical_process which prevent the PMON
from restarting when the individual daemon crashes.
2021-08-07 22:43:38 -07:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
Sujin Kang
cbc75fe4c8
[pmon]: Fix the continous syseepromd autorestart issue on 201911 (#4478)
- Remove syseepromd from the critical process of pmon docker
- Fix supervisor autorestart configuration of syseepromd
2020-04-30 15:51:34 -07:00
Sujin Kang
01f3f9286f
[fancontrol] Restart process upon unexpected exit, not entire pmon container (#4101)
* fancontrol restart

* Cleanup the default setting for exitcodes

* Remove the unnecessary stopwaitsecs default settin
2020-03-19 17:24:22 -07:00
yozhao101
4fa3a1e27e [Services] Restart Platform-monitor service upon unexpected critical process exit. (#3689)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2019-11-04 17:44:01 -08:00