Commit Graph

6 Commits

Author SHA1 Message Date
Kuanyu Chen
cffd87a627
Add monit_snmp file to monitor memory usage (#14464)
#### Why I did it

When CPU is busy, the sonic_ax_impl may not have sufficient speed to handle the notification message sent from REDIS.
Thus, the message will keep stacking in the memory space of sonic_ax_impl.

If the condition continues, the memory usage will keep increasing.

#### How I did it

Add a monit file to check if the SNMP container where sonic_ax_impl resides in use more than 4GB memory.
If yes, restart the sonic_ax_impl process.

#### How to verify it

Run a lot of this command: `while true; do ret=$(redis-cli -n 0 set LLDP_ENTRY_TABLE:test1 test1); sleep 0.1; done;`
And check the memory used by sonic_ax_impl keeps increasing.

After a period, make sure the sonic_ax_impl is restarted when the memory usage reaches the 4GB threshold.
And verify the memory usage of sonic_ax_impl drops down from 4GB.
2023-04-06 12:19:11 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Nazarii Hnydyn
79bda7d0d6
[monit]: Fix process checker. (#5480)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-09-29 17:23:09 -07:00
yozhao101
13cec4c486
[Monit] Unmonitor the processes in containers which are disabled. (#5153)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:28:28 -07:00
Joe LeVeque
5d5d5739c2
[dockers] Rename 'docker-snmp-sv2' to 'docker-snmp' (#4699)
The `-sv2` suffix was used to differentiate SNMP Dockers when we transitioned from "SONiCv1" to "SONiCv2", about four years ago. The old Docker materials were removed long ago; there is no need to keep this suffix. Removing it aligns the name with all the other Dockers.

Also edit Monit configuration to detect proper snmp-subagent command line in Buster, and make snmpd command line matching more robust.
2020-06-11 16:04:23 -07:00