0fad6bdc7f
Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action
34 lines
1.5 KiB
Plaintext
34 lines
1.5 KiB
Plaintext
###############################################################################
|
|
## Monit configuration for SONiC host OS
|
|
##
|
|
## This includes system-level monitoring as well as processes which
|
|
## run in the host OS (i.e., not inside a Docker container)
|
|
###############################################################################
|
|
|
|
check filesystem root-overlay with path /
|
|
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
|
|
|
|
check filesystem var-log with path /var/log
|
|
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
|
|
|
|
check system $HOST
|
|
if memory usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
|
|
if cpu usage (user) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
|
|
if cpu usage (system) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
|
|
|
|
check process rsyslog with pidfile /var/run/rsyslogd.pid
|
|
start program = "/bin/systemctl start rsyslog.service"
|
|
stop program = "/bin/systemctl stop rsyslog.service"
|
|
if totalmem > 800 MB for 10 times within 20 cycles then restart
|
|
|
|
# route_check.py Verify routes between APPL-DB & ASIC-DB are in sync.
|
|
# For any discrepancy, details are logged and a non-zero code is returned
|
|
# which would trigger a monit alert.
|
|
# Hence for any discrepancy, there will be log messages for "ERR" level
|
|
# from both route_check.py & monit.
|
|
#
|
|
check program routeCheck with path "/usr/bin/route_check.py"
|
|
every 5 cycles
|
|
if status != 0 for 3 cycle then alert repeat every 1 cycles
|
|
|