sonic-buildimage

Author	SHA1	Message	Date
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
Tamer Ahmed	a10c5bfd02	[frr] Reduce Calls to SONiC Cfggen (#5176 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to two calls during startup when starting frr service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:47:42 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
yozhao101	23ff55a709	[Services] Restart BGP service upon unexpected critical process exit. (#4207 )	2020-03-03 16:50:32 -08:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
Taoyu Li	2e3975d6ed	[config] Fix an issue that bgp asn data type is not consistent (#953 ) * Fix an issue that bgp asn data type is not consistent from minigraph parser and DB * Fix test typo	2017-09-13 21:23:06 -07:00
Taoyu Li	c9cc7aea41	[configdb] Migrate minigraph configurations to DB (#942 ) Modify minigraph parser output format so it fit DB schema Modify configuration templates to fit new schema Systemd services dependencies are modified so database starts before any configuration consumer	2017-09-12 14:13:27 -07:00
Taoyu Li	e4502527d0	Revert "Migrate DEVICE_METADATA to db (#919 )" (#928 ) This reverts commit `44502b217b`.	2017-08-29 17:03:31 -07:00
Taoyu Li	44502b217b	Migrate DEVICE_METADATA to db (#919 )	2017-08-29 10:47:25 -07:00
Taoyu Li	a2fe0212be	[ConfigDB] Move all BGP configuration into DB (#861 ) - BGP data read from minigraph.py now match DB schema - BGP templates are updated - bgpcfgd can now deal with runtime neighbor create/delete	2017-08-08 16:23:58 -07:00
Joe LeVeque	f49cac086f	Remove extra trailing newlines at EOF (#804 ) Files now end with a single newline	2017-07-12 20:54:37 -07:00
Joe LeVeque	d5c13c0a83	[dockers]: Disable autorestart on all supervisor processes inside containers (#580 )	2017-05-09 17:37:08 -07:00
Joe LeVeque	8f348399f5	[Dockers]: Manage all Docker containers with Supervisord (#573 ) - Consolidate config.sh and start.sh scripts into one script (start.sh) - Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT - All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files - Supervisord log messages are now also sent to syslog - Removed unused smartmontools package from docker-platform-monitor	2017-05-08 15:43:31 -07:00
Rodny Molina	d30fbf1d72	[build]: Adding support for Free-Range-Routing stack. (#510 ) - Extending SONiC building infrastructure to provide users with greater flexibility, by allowing them to elect a routing-stack different than the default one (quagga). The desired routing-stack will be defined in rules/config file. - As part of these changes I'm adding support for Free-Range-Routing (FRR) stack. Quagga will continue to be the default routing-stack. Signed-off-by: Rodny Molina <rodny@linkedin.com>	2017-04-20 09:12:27 -07:00
Taoyu Li	fed908fc6b	[config-engine] minigraph.py refactoring (#448 ) * Refactor minigraph.py See description in https://github.com/Azure/sonic-buildimage/pull/448 for detail	2017-03-30 15:25:31 -07:00
pavel-shirshov	814fd87e63	Remove /var/run/rsyslogd.pid bofore starting rsyslog (#453 )	2017-03-29 18:07:25 -07:00
pavel-shirshov	bac738f91f	Add bgp container with gobgp (#358 ) * Add go-1.7 into docker-slave * Create container docker-fpm-gobgp with gobgpd inside	2017-03-02 11:33:46 -08:00

20 Commits