Commit Graph

10 Commits

Author SHA1 Message Date
yozhao101
be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
yozhao101
23ff55a709
[Services] Restart BGP service upon unexpected critical process exit. (#4207) 2020-03-03 16:50:32 -08:00
Rodny Molina
196d9f5f8a [quagga]: Adjusting bgp jinja template and quagga's supervisord (#2291)
There are two minor changes in this PR:

* Adjust quagga's jinja template to enable bgp-gr functionality by default. Currently is only applicable to those devices tagged as TOR/T0.

* Ensure that no bgp-notification is sent out to remote-peers during bgpd shutdown events. The goal here is to make sure that remote-peers kick off bgp-gr-helper logic (i.e. retain restarting-router state), which can be only achieved if an ungraceful-shutdown (tcp pipe/socket down) is perceived. There are other approaches to accomplish this goal, such as draft-ietf-idr-bgp-gr-notification, but this one hasn't been implemented yet by Quagga/FRR.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>
2018-11-27 00:39:38 -08:00
pavel-shirshov
f43580492f
quagga container processes could be restarted within a second (#1541) 2018-03-28 12:34:46 -07:00
pavel-shirshov
9139c7fe64 Always start with Forwarding State flag set for bgpd (#963) 2017-09-19 12:27:18 -07:00
Joe LeVeque
ed66588473 [docker-fpm-quagga]: Manage Quagga processes (zebra, bgpd) using supervisor instead of watchquagga (#900) 2017-08-21 13:55:59 -07:00
Taoyu Li
b6efe438b5 Introduce ConfigDB (#808)
* [cfggen] Support reading from and writing to configdb
* [bgp] Move bgp_admin_state to configdb, support dynamic admin state change
* [sonic-utilities] Adapt configDB for admin status, support config save and config load
2017-08-01 19:02:00 -07:00
Joe LeVeque
f49cac086f Remove extra trailing newlines at EOF (#804)
Files now end with a single newline
2017-07-12 20:54:37 -07:00
Joe LeVeque
d5c13c0a83 [dockers]: Disable autorestart on all supervisor processes inside containers (#580) 2017-05-09 17:37:08 -07:00
Joe LeVeque
8f348399f5 [Dockers]: Manage all Docker containers with Supervisord (#573)
- Consolidate config.sh and start.sh scripts into one script (start.sh)
 - Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT
 - All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files
 - Supervisord log messages are now also sent to syslog
 - Removed unused smartmontools package from docker-platform-monitor
2017-05-08 15:43:31 -07:00