This repository has been archived on 2025-03-20. You can view files and clone it, but cannot push or open issues or pull requests.
sonic-buildimage/files/image_config
Junchao-Mellanox 1c97a03b81
[system-health] Add support for monitoring system health (#4835)
* system health first commit

* system health daemon first commit

* Finish healthd

* Changes due to lower layer logic change

* Get ASIC temperature from TEMPERATURE_INFO table

* Add system health make rule and service files

* fix bugs found during manual test

* Change make file to install system-health library to host

* Set system LED to blink on bootup time

* Caught exceptions in system health checker to make it more robust

* fix issue that fan/psu presence will always be true

* fix issue for external checker

* move system-health service to right after rc-local service

* Set system-health service start after database service

* Get system up time via /proc/uptime

* Provide more information in stat for CLI to use

* fix typo

* Set default category to External for external checker

* If external checker reported OK, save it to stat too

* Trim string for external checker output

* fix issue: PSU voltage check always return OK

* Add unit test cases for system health library

* Fix LGTM warnings

* fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon

* Remove boot_timeout configuration because it will get from monit config file

* Fix argument miss

* fix unit test failure

* fix issue: summary status is not correct

* Fix format issues found in code review

* rename th to threshold to make it clearer

* Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead

* Fix unit test failure

* Fix LGTM alert

* Fix LGTM alert

* Fix review comments

* Fix review comment

* 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker

* Ignore check for unknown service type

* Fix unit test issue

* Rename user define checker to user defined checker

* Rename user_define_checkers to user_defined_checkers for configuration file

* Renmae file user_define_checker.py -> user_defined_checker.py

* Fix typo

* Adjust import order for config.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/health_checker/hardware_checker.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/scripts/healthd

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import orders in src/system-health/tests/test_system_health.py

* Fix typo

* Add new line after import

* If system health configuration file not exist, healthd should exit

* Fix indent and enable pytest coverage

* Fix typo

* Fix typo

* Remove global logger and use log functions inherited from super class

* Change info level logger to notice level

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
2020-10-12 11:12:49 +03:00
..
apt change image apt source list from stretch to buster for arm 2020-05-25 13:15:19 +00:00
bash [baseimage]: Increase TMOUT for serial port connections to 15 minutes (#3032) 2019-06-19 00:16:01 -07:00
caclmgrd Optimized caclmgrd Notification handling. Previously (#5560) 2020-10-08 11:31:09 -07:00
config-setup [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
constants [bgp] Add 'allow list' manager feature (#5513) 2020-10-02 10:06:04 -07:00
corefile_uploader corefile uploader: Updates per review comments offline (#3915) 2019-12-30 13:01:03 -08:00
cron.d [core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659) 2019-10-23 15:55:47 -07:00
ebtables [ebtbles] Replace binary config file to text config file for ebtables (#5252) 2020-09-03 17:27:07 -07:00
environment [image]: Update login message (#706) 2017-06-14 15:18:02 -07:00
fstrim [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
hostcfgd Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358) 2020-09-22 08:34:02 -07:00
hostname [hostname-config] improve hostname-config process (#3676) 2019-10-29 08:30:27 -07:00
interfaces [baseimage]: Change the loopback mask from /8 to /16 (#5353) 2020-09-15 15:29:48 -07:00
kubernetes [baseimage]: Install Kubernetes packages if enabled in image (#4374) 2020-04-13 08:41:18 -07:00
logrotate [logrotate] create separate logrotate.d config for update-alternatives (#5382) 2020-09-22 01:23:42 -07:00
misc [docker-wait-any] Use APIClient instead of Client according to API update 2020-04-17 04:51:51 +00:00
monit [Monit] Unmonitor the processes in containers which are disabled. (#5153) 2020-09-25 00:28:28 -07:00
ntp [ntp] disable ntp long jump (#4748) 2020-06-11 13:01:21 -07:00
pcie-check Fix bug with pcie-check.service (#5368) 2020-09-15 15:21:31 -07:00
platform [rc.local] separate configuration migration and grub installation logic (#5528) 2020-10-03 23:00:39 -07:00
procdockerstatsd Fix exception when attempting to write a datetime to db (#5467) 2020-09-25 20:19:18 +08:00
process-reboot-cause [process-reboot-cause] Use Logger class from sonic-py-common package (#5384) 2020-09-16 10:35:19 -07:00
rsyslog syslog changes Multi ASIC platforms (#4738) 2020-07-12 18:08:51 +00:00
secureboot [platform] Add Support For Environment Variable File (#5010) 2020-07-31 17:59:09 -07:00
snmp mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from con… (#4057) 2020-01-28 17:41:21 -08:00
sudoers [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
sysctl [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838) 2020-07-12 18:08:51 +00:00
syslog [baseimage]: /host unmount timeout issue during reboot. (#5032) 2020-07-25 01:27:58 -07:00
system-health [system-health] Add support for monitoring system health (#4835) 2020-10-12 11:12:49 +03:00
systemd [services] Restart SwSS service upon unexpected critical process exit (#2845) 2019-05-01 08:02:38 -07:00
topology [platform] Add Support For Environment Variable File (#5010) 2020-07-31 17:59:09 -07:00
updategraph [platform] Add Support For Environment Variable File (#5010) 2020-07-31 17:59:09 -07:00
warmboot-finalizer [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00
watchdog-control [sonic-utilities] Build and install as a Python wheel package (#5409) 2020-09-20 20:16:42 -07:00