Commit Graph

11 Commits

Author SHA1 Message Date
Stepan Blyshchak
06f8b1f98a
[auto-ts] add memory check (#10433) (#12291)
#### Why I did it

To support automatic techsupport invokation in case memory usage is too high.

#### How I did it

Implemented according to https://github.com/Azure/SONiC/pull/939

#### How to verify it

UT, manual test on the switch.

*DEPENDS* on https://github.com/Azure/sonic-utilities/pull/2116
2022-10-06 08:06:46 -07:00
Ying Xie
76f7d7fa53 Revert "[auto-ts] add memory check (#10433)"
This reverts commit a2cd0f5d4c.
2022-10-04 21:53:45 +00:00
Stepan Blyshchak
a2cd0f5d4c [auto-ts] add memory check (#10433)
#### Why I did it

To support automatic techsupport invokation in case memory usage is too high.

#### How I did it

Implemented according to https://github.com/Azure/SONiC/pull/939

#### How to verify it

UT, manual test on the switch.

*DEPENDS* on https://github.com/Azure/sonic-utilities/pull/2116
2022-10-03 18:58:38 +00:00
Volodymyr Samotiy
e3a30deea9
[monit] Periodically monitor VNET route consistency (#8266)
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-08-19 16:29:25 -07:00
Renuka Manavalan
2cd61bc136
Invoke disk check periodically. (#7374)
Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.
2021-05-26 17:59:08 -07:00
yozhao101
04cd1d61e8
[Monit] Monitoring the running status of containers. (#6251)
**- Why I did it**
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. 

**- How I did it**
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

**- How to verify it**
I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-01-07 19:52:22 -08:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Joe LeVeque
3987cbd80a
[sonic-utilities] Build and install as a Python wheel package (#5409)
We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3.

Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package.

**- How I did it**
- Build and install sonic-utilities as a Python package
- Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel
- Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications
- Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable

Submodule updates:

* src/sonic-utilities aa27dd9...2244d7b (5):
  > Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122)
  > [consutil] Display remote device name in show command (#1120)
  > [vrf] fix check state_db error when vrf moving (#1119)
  > [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117)
  > Update to make config load/reload backward compatible. (#1115)

* src/sonic-ztp dd025bc...911d622 (1):
  > Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)
2020-09-20 20:16:42 -07:00
Renuka Manavalan
312771dc3e
[monit] Periodically monitor route consistency (#5085)
* Add route_check to mont.

* Switched to units of cycles per comments

* Added comments per Joe's comments.

* Added more comments per Royal's comments.
2020-08-04 10:33:13 -07:00
yozhao101
aa67921d06 [Monit] Change the monitoring period from 120 seconds to 60 seconds. (#3974)
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-01-10 13:01:24 -08:00
Joe LeVeque
24a0c46464
[monit] Build from source and patch to use MemAvailable value if available on system (#3875) 2019-12-30 18:25:57 -08:00