Fix monit false alarm issue, which located in process_checker and it missed "disk-sleep" status check, thus some 201911 SONiC box report "pmon|sensord" error coincidently. #### Why I did it Currently psutil library returns below detail process status: running: The process is currently running. sleeping: The process is sleeping or waiting for an event to occur. disk-sleep: The process is waiting for I/O operations to complete. stopped: The process has been stopped (e.g. via the SIGSTOP signal). zombie: The process has terminated but is still listed in the process table. dead: The process has terminated and has been removed from the process table. We should regard running/sleeping/disk-sleep as normal case and not alert in monit process. Now once the disk-sleep occurs during monit cycle, below syslog will be paged, so get rid of syslog output meanwhile. yslog.2.gz:Feb 24 06:12:17.394619 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:13:17.932531 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host syslog.2.gz:Feb 24 06:14:18.502505 MEL23-0101-0301-04T1 ERR monit[6040]: 'pmon|sensord' status failed (1) -- '/usr/sbin/sensord -f daemon' is not running in host Then I tried to reproduce the issue by triggering process_checker for sensord frequently and observed it's under "disk-sleep" status once the alert is raised. ##### Work item tracking - Microsoft ADO **(number only)**:17663589 #### How I did it Fix process_checker script code for adding "disk-sleep" case handling. #### How to verify it Verified in local DUT. |
||
---|---|---|
.. | ||
conf.d | ||
generate_monit_config | ||
generate_monit_config.service | ||
memory_checker | ||
monitrc | ||
process_checker | ||
restart_service |