sonic-buildimage/files/image_config/monit/conf.d/sonic-host
abdosi 0fad6bdc7f [monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-11-01 10:27:10 -08:00

34 lines
1.5 KiB
Plaintext

###############################################################################
## Monit configuration for SONiC host OS
##
## This includes system-level monitoring as well as processes which
## run in the host OS (i.e., not inside a Docker container)
###############################################################################
check filesystem root-overlay with path /
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check filesystem var-log with path /var/log
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check system $HOST
if memory usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
if cpu usage (user) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
if cpu usage (system) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check process rsyslog with pidfile /var/run/rsyslogd.pid
start program = "/bin/systemctl start rsyslog.service"
stop program = "/bin/systemctl stop rsyslog.service"
if totalmem > 800 MB for 10 times within 20 cycles then restart
# route_check.py Verify routes between APPL-DB & ASIC-DB are in sync.
# For any discrepancy, details are logged and a non-zero code is returned
# which would trigger a monit alert.
# Hence for any discrepancy, there will be log messages for "ERR" level
# from both route_check.py & monit.
#
check program routeCheck with path "/usr/bin/route_check.py"
every 5 cycles
if status != 0 for 3 cycle then alert repeat every 1 cycles