sonic-buildimage/files/image_config/monit/conf.d/sonic-host
Renuka Manavalan 2cd61bc136
Invoke disk check periodically. (#7374)
Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.
2021-05-26 17:59:08 -07:00

43 lines
2.0 KiB
Plaintext

###############################################################################
## Monit configuration for SONiC host OS
##
## This includes system-level monitoring as well as processes which
## run in the host OS (i.e., not inside a Docker container)
###############################################################################
check filesystem root-overlay with path /
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check filesystem var-log with path /var/log
if space usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check system $HOST
if memory usage > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
if cpu usage (user) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
if cpu usage (system) > 90% for 10 times within 20 cycles then alert repeat every 1 cycles
check process rsyslog with pidfile /var/run/rsyslogd.pid
start program = "/bin/systemctl start rsyslog.service"
stop program = "/bin/systemctl stop rsyslog.service"
if totalmem > 800 MB for 10 times within 20 cycles then restart
# route_check.py Verify routes between APPL-DB & ASIC-DB are in sync.
# For any discrepancy, details are logged and a non-zero code is returned
# which would trigger a monit alert.
# Hence for any discrepancy, there will be log messages for "ERR" level
# from both route_check.py & monit.
#
check program routeCheck with path "/usr/local/bin/route_check.py"
every 5 cycles
if status != 0 for 3 cycle then alert repeat every 1 cycles
# Check if /etc & /home are writable. If not, make them writable.
# Raise syslog error message, in case of underlying issues
#
check program diskCheck with path "/usr/local/bin/disk_check.py"
every 5 cycles
if status != 0 for 3 cycle then alert repeat every 1 cycles
check program container_checker with path "/usr/bin/container_checker"
if status != 0 for 5 times within 5 cycles then alert repeat every 1 cycles