sonic-buildimage

Author	SHA1	Message	Date
Joe LeVeque	dd9be59cd1	[202012][dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7203 ) #### Why I did it Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 202012 branch. To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-04-01 12:52:19 -07:00
yozhao101	cc9c3f567e	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-28 09:28:27 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
Tamer Ahmed	6754635010	[cfggen] Make Jinja2 Template Python 3 Compatible Jinja2 templates rendered using Python 3 interpreter, are required to conform with Python 3 new semantics. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-30 07:07:43 -07:00
Tamer Ahmed	a10c5bfd02	[frr] Reduce Calls to SONiC Cfggen (#5176 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to two calls during startup when starting frr service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:47:42 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
yozhao101	23ff55a709	[Services] Restart BGP service upon unexpected critical process exit. (#4207 )	2020-03-03 16:50:32 -08:00
pavel-shirshov	d5af096f41	[TSA]: Add community to the loopback prefix, when isolated (#3708 ) * Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml * Fix bgp templates * Add community for loopback when bgpd is isolated * Use correct community value	2019-11-06 16:07:28 -08:00
pavel-shirshov	7af546908f	[bgp]: Fix isolate/unisolate command for ipv6 peers (#3183 ) * Fix isolate/unisolate command for ipv6 peers	2019-07-18 16:34:26 -07:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
pavel-shirshov	602369126c	[docker-fpm-quagga]: Add support for PeerAsn and UpdateAddress (#2766 )	2019-04-10 21:50:36 -07:00
Ying Xie	af64fd66d2	[bgp quagga] increase BGP graceful restart timeout to 240 seconds (#2754 ) There are some platforms with less powerful CPU/hard-drive could take longer to get ready for BGP. For these platforms, 240 seconds would be a safer threshold. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-04-10 19:13:03 -07:00
Ying Xie	d9c076dada	[quagga bgp] set quagga graceful restart timeout to 180 seconds (#2362 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-12-08 11:38:31 -08:00
Rodny Molina	196d9f5f8a	[quagga]: Adjusting bgp jinja template and quagga's supervisord (#2291 ) There are two minor changes in this PR: * Adjust quagga's jinja template to enable bgp-gr functionality by default. Currently is only applicable to those devices tagged as TOR/T0. * Ensure that no bgp-notification is sent out to remote-peers during bgpd shutdown events. The goal here is to make sure that remote-peers kick off bgp-gr-helper logic (i.e. retain restarting-router state), which can be only achieved if an ungraceful-shutdown (tcp pipe/socket down) is perceived. There are other approaches to accomplish this goal, such as draft-ietf-idr-bgp-gr-notification, but this one hasn't been implemented yet by Quagga/FRR. Signed-off-by: Rodny Molina <rmolina@linkedin.com>	2018-11-27 00:39:38 -08:00
zhenggen-xu	51a76614a3	Restore neighbor table to kernel during system warm-reboot (#2213 ) * Restore neighbor table to kernel during system warm-reboot Added a service: "restore_neighbors" to restore neighbor table into kernel during system warm reboot. The service is started by supervisord in swss docker when the docker is started. In case system warm reboot is enabled, it will try to restore the neighbor table from appDB into kernel through netlink API calls and update the neighbor table by sending arp/ns requests to all neighbor entries, then it sets the stateDB flag for neighsyncd to continue the reconciliation process. -- Added tcpdump python-scapy debian package into orchagent and vs dockers. -- Added python module: pyroute2 netifaces into orchagent and vc dockers. -- Workarounded tcpdump issue in the vs docker Signed-off-by: Zhenggen Xu <zxu@linkedin.com> * Move the restore_neighbors.py to sonic-swss submodule Made changes to makefiles accordingly Make dockerfile.j2 changes and supervisord config changes Add python monotonic lib for time access Signed-off-by: Zhenggen Xu <zxu@linkedin.com> * Added PYTHON_SWSSCOMMON as swss runtime dependency Signed-off-by: Zhenggen Xu <zxu@linkedin.com>	2018-11-09 17:06:09 -08:00
Taoyu Li	6a37365d93	[zebra.conf] Avoid zebra crash upon empty configuration (#2203 )	2018-10-28 22:32:31 -07:00
lguohan	f3ca7c422f	[rsyslog]: use # to separate container name and program name in syslog message (#1918 ) Previously use / to separate container name and program name. However, in rsyslogd: Precisely, the programname is terminated by either (whichever occurs first): end of tag nonprintable character ‘:’ ‘[‘ ‘/’ The above definition has been taken from the FreeBSD syslogd sources. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2018-08-12 22:23:58 -07:00
Qi Luo	7ba08e5bf6	Prefix docker container name to syslog syslogtag (program name) (#1810 )	2018-06-25 10:48:42 -07:00
pavel-shirshov	a2a6aead4c	[bgp]: Enable bgp soft-reconfiguration inbound for quagga templates (#1803 ) * Enable bgp soft-reconfiguration inbound for quagga templates	2018-06-22 18:04:18 -07:00
pavel-shirshov	bbca58329b	Manually send SIGHUP to vtysh when the current session was disconnected (#1801 ) * Manually send SIGHUP to vtysh when the current session was disconnected * Address comments	2018-06-20 12:15:09 -07:00
Qi Luo	1c8bacb007	Fix comment typos (#1794 ) Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>	2018-06-14 21:53:31 -07:00
pavel-shirshov	fae346f586	Don't create a pty to run vtysh inside of the docker container (#1792 )	2018-06-14 12:11:29 -07:00
Joe LeVeque	832be7b8f4	[dockers] Prevent apt-get from installing suggested and recommended packages by default (#1666 ) * [docker-base] Instruct apt-get to NOT install 'recommended' or 'suggested' packages * Modify docker-fpm-quagga, docker-snmp-sv2 and docker-sonic-vs Dockerfile templates in order to properly install .deb dependencies * REDIS_SERVER depends on REDIS_TOOLS; ensure REDIS_TOOLS is always installed before REDIS_SERVER	2018-05-02 11:46:21 -07:00
Taoyu Li	bebb7a0fa2	[zebra.conf] Fix template issue with multiple lo addresses (#1662 ) * [zebra.conf] Fix template issue with multiple lo addresses * Add unitest for Loopback1	2018-05-01 20:53:47 -07:00
pavel-shirshov	1ae4db3af7	Quagga: Use bgp keepalive and holdtime timers from configdb (#1661 )	2018-04-30 16:52:22 -07:00
pavel-shirshov	f43580492f	quagga container processes could be restarted within a second (#1541 )	2018-03-28 12:34:46 -07:00
Joe LeVeque	e1cb2ace36	[base image files] All 'docker exec' wrapper scripts now dynamically adjust their flags depending on whether or not they are run on a terminal (#1507 )	2018-03-17 00:43:29 -07:00
Taoyu Li	dd7e9240c8	[dockers] Remove dependency to minigraph (#1179 ) * Remove dependency to minigraph * Remove -m in swssconfig.sh	2017-11-23 16:31:37 -08:00
nikos-li	f18ed0d35c	[bgp]: Auto-completion, help (?), cmd navigation (up arrow) not working in vtysh on host system. (#1124 )	2017-11-13 09:39:10 -08:00
Taoyu Li	8bc6b55331	[bgpd.conf] Fix template issue with multiple lo addresses (#1060 )	2017-10-20 07:15:11 -07:00
Taoyu Li	7a0a2ea5d0	[bgpd.conf] Advertise /64 prefix for ipv6 lo addresses (#1050 )	2017-10-17 18:28:27 -07:00
pavel-shirshov	9139c7fe64	Always start with Forwarding State flag set for bgpd (#963 )	2017-09-19 12:27:18 -07:00
Shuotian Cheng	aa549f208c	[bgp]: Fix the deployment_id with DEVICE_METADATA (#962 )	2017-09-18 13:04:29 -07:00
Taoyu Li	2e3975d6ed	[config] Fix an issue that bgp asn data type is not consistent (#953 ) * Fix an issue that bgp asn data type is not consistent from minigraph parser and DB * Fix test typo	2017-09-13 21:23:06 -07:00
Taoyu Li	c9cc7aea41	[configdb] Migrate minigraph configurations to DB (#942 ) Modify minigraph parser output format so it fit DB schema Modify configuration templates to fit new schema Systemd services dependencies are modified so database starts before any configuration consumer	2017-09-12 14:13:27 -07:00
sihuihan88	127a73aac3	[quagga]: Disable ipv4 over ipv6 and enable ipv6 over ipv4 peer group (#922 ) * [bgpd]:disable ipv4 over ipv6 and enable ipv6 over ipv4 peer group * update as comments	2017-08-30 13:06:02 -07:00
Taoyu Li	e4502527d0	Revert "Migrate DEVICE_METADATA to db (#919 )" (#928 ) This reverts commit `44502b217b`.	2017-08-29 17:03:31 -07:00
Taoyu Li	44502b217b	Migrate DEVICE_METADATA to db (#919 )	2017-08-29 10:47:25 -07:00
Joe LeVeque	ed66588473	[docker-fpm-quagga]: Manage Quagga processes (zebra, bgpd) using supervisor instead of watchquagga (#900 )	2017-08-21 13:55:59 -07:00
zhenggen-xu	c52e876697	Fix the network command for ipv6 vlan interfaces (#894 )	2017-08-16 21:12:32 -07:00
Taoyu Li	a2fe0212be	[ConfigDB] Move all BGP configuration into DB (#861 ) - BGP data read from minigraph.py now match DB schema - BGP templates are updated - bgpcfgd can now deal with runtime neighbor create/delete	2017-08-08 16:23:58 -07:00
Taoyu Li	b6efe438b5	Introduce ConfigDB (#808 ) * [cfggen] Support reading from and writing to configdb * [bgp] Move bgp_admin_state to configdb, support dynamic admin state change * [sonic-utilities] Adapt configDB for admin status, support config save and config load	2017-08-01 19:02:00 -07:00
sihuihan88	1176508858	[bgpd]: support multiple peer range in single peer group (#807 )	2017-07-13 15:03:10 -07:00
Joe LeVeque	f49cac086f	Remove extra trailing newlines at EOF (#804 ) Files now end with a single newline	2017-07-12 20:54:37 -07:00
lguohan	4bdcac8e4f	[bgp]: move allowas-in into ipv6 section to enable allowas-in for ipv6 (#741 )	2017-06-22 19:50:24 -07:00
sihuihan88	3268946de5	[BGPD]: add bgp dynamic neighbor configuration (#708 ) * add bgp dynamic neighbor configuration * [bgpd]: update as comments * update as comment * update to deployment_id_asn_map * minor change	2017-06-21 18:52:50 -07:00
Taoyu Li	95906a6490	[installer] Copy old config files rather than only minigraph (#730 )	2017-06-21 11:02:25 -07:00
Taoyu Li	5e6620e19e	[bgp] Save bgp admin state (#690 ) * [bgp] Save admin state and set default state to shutdown * Set default behavior to no shutdown * Add build option SHUTDOWN_BGP_ON_START * Script change for default admin state to be on * Address CR comments to bgp_neighbor script * Fix script bug	2017-06-12 11:05:22 -07:00

1 2

53 Commits