85ca3abc2f
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Junchao-Mellanox
|
1c97a03b81
|
[system-health] Add support for monitoring system health (#4835)
* system health first commit * system health daemon first commit * Finish healthd * Changes due to lower layer logic change * Get ASIC temperature from TEMPERATURE_INFO table * Add system health make rule and service files * fix bugs found during manual test * Change make file to install system-health library to host * Set system LED to blink on bootup time * Caught exceptions in system health checker to make it more robust * fix issue that fan/psu presence will always be true * fix issue for external checker * move system-health service to right after rc-local service * Set system-health service start after database service * Get system up time via /proc/uptime * Provide more information in stat for CLI to use * fix typo * Set default category to External for external checker * If external checker reported OK, save it to stat too * Trim string for external checker output * fix issue: PSU voltage check always return OK * Add unit test cases for system health library * Fix LGTM warnings * fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon * Remove boot_timeout configuration because it will get from monit config file * Fix argument miss * fix unit test failure * fix issue: summary status is not correct * Fix format issues found in code review * rename th to threshold to make it clearer * Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead * Fix unit test failure * Fix LGTM alert * Fix LGTM alert * Fix review comments * Fix review comment * 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker * Ignore check for unknown service type * Fix unit test issue * Rename user define checker to user defined checker * Rename user_define_checkers to user_defined_checkers for configuration file * Renmae file user_define_checker.py -> user_defined_checker.py * Fix typo * Adjust import order for config.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/health_checker/hardware_checker.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/scripts/healthd Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import orders in src/system-health/tests/test_system_health.py * Fix typo * Add new line after import * If system health configuration file not exist, healthd should exit * Fix indent and enable pytest coverage * Fix typo * Fix typo * Remove global logger and use log functions inherited from super class * Change info level logger to notice level Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> |