Why/How I did:
Make sure first error syslog is triggered based on FAULT TOLERANCE condition.
Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent
Updated the monit conf files with repeat every x cycles for the alert action
The psutil library used in process_checker create a cache for each
process when calling process_iter. So, there is some possibility that
one process exists when calling process_iter, but not exists when
calling cmdline, which will raise a NoSuchProcess exception. This commit
fix the issue.
Signed-off-by: bingwang <bingwang@microsoft.com>
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.
- Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0