sonic-buildimage

Author	SHA1	Message	Date
kellyyeh	8cd346d80b	Update docker-router-advertiser.supervisord.conf.j2 (#10375 )	2022-04-06 09:44:21 -07:00
Saikrishna Arcot	588ed0b760	Upgrade router-advertiser container to Bullseye (#10374 ) Change the base image from `docker-config-engine-buster` to `docker-config-engine-bullseye`, and remove the hardcoded `radvd` version from the Dockerfile. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-04-01 16:12:43 -07:00
kellyyeh	f136c53d19	[radv] Support multiple ipv6 prefixes per vlan interface (#9934 ) Why I did it Radvd.conf.j2 template creates two copies of the vlan interface when there are more than one ipv6 address assigned to a single vlan interface. Changed the format to add prefixes under the same vlan interface block. How I did it Modifies radvd.conf.j2 and added unit tests How to verify it Configure multiple ipv6 address to the same vlan, start radvd Unit test will check if radvd.conf with multiple ipv6 addresses is formed correctly	2022-02-16 14:17:26 -08:00
kellyyeh	d11207d4f4	[radv] Run radv on MgmtToRRouter (#9424 ) * Allow radv to run on mgmt tor and EPMS	2021-12-03 09:45:06 -08:00
kellyyeh	df6361f50c	Change radv interval to 3min (#8882 )	2021-10-01 15:00:16 -07:00
LuiSzee	cf83a99f45	[radv] fix bug for radv can't startup if DEVICE_METADATA.localhost.type is NULL (#7651 ) Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-05-25 08:17:44 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
Qi Luo	ce3b2cbfc5	[radv] Disable radv for specific deployment_id (#6830 )	2021-02-20 11:01:12 -08:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Renuka Manavalan	ba02209141	First cut image update for kubernetes support. (#5421 ) * First cut image update for kubernetes support. With this, 1) dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled for kube management init_cfg.json configure set_owner as kube for these 2) Each docker's start.sh updated to call container_startup.py to register going up As part of this call, it registers the current owner as local/kube and its version The images are built with its version ingrained into image during build 3) Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'. For all locally managed containers, it calls docker commands, hence no change for locally managed. 4) Introduced a new ctrmgrd service, that helps with transition between owners as kube & local and carry over any labels update from STATE-DB to API server 5) hostcfgd updated to handle owner change 6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image. 7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.	2020-12-22 08:01:33 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
Lawrence Lee	d0f16c0d79	Make backend device checking more robust (#5730 ) Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-10 15:06:35 -08:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
Joe LeVeque	fb8f09a116	[radvd] No longer build from source; Install vanilla Debian package once again (#5242 ) Remove radvd Makefile and patch, change docker-router-advertiser Dockerfile template to simply install the vanilla radvd package using apt-get. - In PR https://github.com/Azure/sonic-buildimage/pull/2795, we started building radvd from source and patching it to prevent it from erroring out when advertising an MTU of 9100 which was greater than the MTU size configured on the bridge interface (1500), which was due to a limitation in the 4.9 Linux kernel. - Master branch is now using Linux kernel 4.19. As of 4.18, the kernel supports setting a bridge MTU to a value > 1500. - PR https://github.com/Azure/sonic-swss/pull/1393 modified vlanmgrd to take advantage of this and now configures the MTU of bridge interfaces in SONiC to the proper size of 9100. Therefore, we no longer need to patch radvd. Since we no longer need to patch radvd, we no longer need to build it from source, so we can save build time by going back to simply installing the vanilla radvd Debian package in the router-advertiser container.	2020-09-01 13:53:36 -07:00
Joe LeVeque	97d44214cf	[docker-radv] Fix startup issues (#5230 ) - Why I did it PR https://github.com/Azure/sonic-buildimage/pull/4599 introduced two bugs in the startup of the router advertiser container: 1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed 2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read. - How I did it 1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh` 2. Use the Jinja2 "namespace" construct to fix the scope issue - How to verify it Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).	2020-08-21 13:12:01 -07:00
Tamer Ahmed	adcca53b8d	[radv] Reduce Calls to SONiC Cfggen (#5178 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when starting radv service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:48:04 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
joyas-joseph	1714e621be	[docker-radv]: Convert radv docker to buster (#4727 ) * Set radvd version to match buster version(2.17-2) Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>	2020-06-12 16:10:23 -07:00
Guohan Lu	7ea6d9dc8f	[docker-radvd]: use service dependency in supervisord to start services	2020-05-22 11:01:28 -07:00
yozhao101	91e5fb5602	[Service] Enable/disable container auto-restart based on configuration. (#4073 )	2020-02-07 12:34:07 -08:00
Dong Zhang	5057ac3122	[MultiDB] (./dockers dir) : replace redis-cli with sonic-db-cli and use new DBConnector (#3923 ) * [MultiDB] (./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * remove unnecessary quota * update typo	2020-01-22 11:27:21 -08:00
yozhao101	cff30c59d0	[Services] Restart Router-advertiser service upon unexpected critical process exit (#3681 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-10-30 16:41:55 -07:00
Stepan Blyshchak	81cf33231f	[build]: Improve dockerfile instructions (#3048 ) - create a dockerfile-marcros.j2 file with all common operations written as j2 macro - use single dockerfile instruction for COPY and RUN commands when possible to improve build time - reorganize dockerfile instructions to make more cache friendly (in case someday we will remove --no-cache to build docker images) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-06-22 11:26:23 -07:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
Joe LeVeque	c0904f766b	[radvd] Build radvd from source; Patch so as not to treat out-of-range MTU as an error (#2795 )	2019-04-17 16:41:20 -07:00
Joe LeVeque	b48037090e	[router-advertiser] Add templated script to wait for pertinent interfaces to be ready before starting radvd (#2558 )	2019-03-02 15:45:43 -08:00
lguohan	f682e7b131	[docker-radvd]: upgrade docker radvd to stretch based (#2524 ) * [docker-radvd]: upgrade docker radvd to stretch based * install jinja>=2.10 Signed-off-by: Guohan Lu <gulv@microsoft.com> * install pip packages for testing sonic-utilities Signed-off-by: Guohan Lu <gulv@microsoft.com> * set storage driver to vfs Signed-off-by: Guohan Lu <gulv@microsoft.com>	2019-02-06 21:28:07 -08:00
lguohan	f3ca7c422f	[rsyslog]: use # to separate container name and program name in syslog message (#1918 ) Previously use / to separate container name and program name. However, in rsyslogd: Precisely, the programname is terminated by either (whichever occurs first): end of tag nonprintable character ‘:’ ‘[‘ ‘/’ The above definition has been taken from the FreeBSD syslogd sources. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2018-08-12 22:23:58 -07:00
Qi Luo	7ba08e5bf6	Prefix docker container name to syslog syslogtag (program name) (#1810 )	2018-06-25 10:48:42 -07:00
Joe LeVeque	f7151e8ddb	[radvd] Ensure at least one interface is specified in radvd.conf before starting radvd (#1636 )	2018-04-24 15:13:51 -07:00
Joe LeVeque	30466b27c1	[router advertiser] Only start radvd process if device role is 'ToRRouter' (#1569 )	2018-04-06 19:24:18 -07:00
Joe LeVeque	cea87e985c	Add docker-router-advertiser to support IPv6 router advertisements (#1103 )	2017-11-14 14:40:15 -08:00

33 Commits