sonic-buildimage

Author	SHA1	Message	Date
yozhao101	1a3cab43ac	[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-04 10:16:53 -07:00
bingwang-ms	3bb123930b	Fix lldpmgrd syntax issue (#7742 ) Signed-off-by: bingwang <bingwang@microsoft.com>	2021-05-31 16:41:28 +08:00
sudhanshukumar22	f783aefd6d	docker-lldp:intermittent DB errors will result in Client termination (#6119 ) This PR allows listen to hostname changes and mgmt ip changes.	2021-05-18 09:51:02 -07:00
vganesan-nokia	973affce39	[voq/inbandif] Support for inband port as regular port (#6477 ) Changes in this PR are to make LLDP to consider Inband port and to avoid regular port handling on Inband port.	2021-04-01 16:24:57 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
sudhanshukumar22	8a3ac8ff9c	[docker-lldp]: sonic advertise meaningful SysDescription instead of debian (#6114 ) Sonic devices advertise meaningful system description along with Debian package information. before the fix: ------------- admin@sonic:~$ show lldp neighbors ------------------------------------------------------------------------------- LLDP neighbors: ------------------------------------------------------------------------------- Interface: Ethernet0, via: LLDP, RID: 3, Time: 0 day, 16:36:30 SysName: sonic SysDescr: Debian GNU/Linux 9 (stretch) Linux 4.9.0-11-2-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 ------------------------------------------------------------------------------- After the fix: root@sonic:~# show lldp neighbors Ethernet16 ------------------------------------------------------------------------------- LLDP neighbors: ------------------------------------------------------------------------------- Interface: Ethernet16, via: LLDP, RID: 10, Time: 0 day, 00:01:00 SysName: sonic SysDescr: SONiC Software Version: SONiC.sonic_upstream_1.0_daily_201130_1501_62-dirty-20201130.203529 - HwSku: Accton-AS7816-64X - Distribution: Debian 10.6 - Kernel: 4.19.0-9-2-amd64 ------------------------------------------------------------------------------- Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>	2021-01-06 12:24:57 -08:00
Renuka Manavalan	ba02209141	First cut image update for kubernetes support. (#5421 ) * First cut image update for kubernetes support. With this, 1) dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled for kube management init_cfg.json configure set_owner as kube for these 2) Each docker's start.sh updated to call container_startup.py to register going up As part of this call, it registers the current owner as local/kube and its version The images are built with its version ingrained into image during build 3) Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'. For all locally managed containers, it calls docker commands, hence no change for locally managed. 4) Introduced a new ctrmgrd service, that helps with transition between owners as kube & local and carry over any labels update from STATE-DB to API server 5) hostcfgd updated to handle owner change 6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image. 7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.	2020-12-22 08:01:33 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
abdosi	872c85d8e7	[lldp]: Lldp docker to use python3 version of sonic-db-syncd package. (#6046 ) Made changes so that Lldp docker start using py3 of sonic-db-syncd submodule update sonic-db-syncd 5cc29a1b32d8d1f4dfbc967bfea2727c50a49c76 (HEAD -> master, origin/master, origin/HEAD) Changes to convert sonic-dbsyncd from python 2 to 3 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-11-30 10:44:40 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
Longxiang Lyu	385dfc4921	[monit] Fix status error due to shebang change (#5865 ) lldpmgrd, bgpcfgd, and bgpmon are reported error status not running due to recent change of shebang to use `Python3`. Modifying the argument of `process_checker` to follow this change. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2020-11-09 01:52:22 -08:00
Petro Bratash	32a832a8ac	[lldp]: Add verification IPv4 address on LLDP conf Jinja2 Template (#5699 ) Fix #5812 LLDP conf Jinja2 Template does not verify IPv4 address and can use IPv6 version. This issue does not effect control LLDP daemon. Issue can be reproduced via `test_snmp_lldp` test. LLDP conf Jinja2 Template selects first item from the list of mgmt interfaces. TESTBED_1 LLDP conf ``` # cat /etc/lldpd.conf configure ports eth0 lldp portidsubtype local eth0 configure system ip management pattern FC00:3::32 configure system hostname dut-1 ``` TESTBED_2 LLDP conf ``` # cat /etc/lldpd.conf configure ports eth0 lldp portidsubtype local eth0 configure system ip management pattern 10.22.24.61 configure system hostname dut-2 ``` TESTBED_1 MGMT_INTERFACE ``` $ redis-cli -n 4 keys "" \| grep MGMT_INTERFACE MGMT_INTERFACE\|eth0\|10.22.24.53/23 MGMT_INTERFACE\|eth0\|FC00:3::32/64 ``` TESTBED_2 MGMT_INTERFACE ``` $ redis-cli -n 4 keys "" \| grep MGMT_INTERFACE MGMT_INTERFACE\|eth0\|FC00:3::32/64 MGMT_INTERFACE\|eth0\|10.22.24.61/23 ``` Signed-off-by: Petro Bratash <petrox.bratash@intel.com>	2020-11-07 10:30:41 -08:00
Joe LeVeque	e3164d5fb4	[lldpmgrd] Convert to Python 3 (#5785 ) - Convert lldpmgrd to Python 3 - Install Python 3 swsscommon package in docker-lldp	2020-11-03 12:50:11 -08:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
Joe LeVeque	6333bb73b0	Explicitly call `pip2` rather than `pip` in locations where both pip2 and pip3 are installed (#5747 ) As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias. Also some other pip-related cleanup	2020-10-30 09:43:14 -07:00
shlomibitton	e66d49a57c	[LLDP] Fix for LLDP advertisements being sent with wrong information. (#5493 ) * Fix for LLDP advertisments being sent with wrong information. Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID. This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com> * Fix comments * Fix unit-test output caused a failure during build * Add 'run_cmd' function and use it * Resume lldpd even if port init timeout reached	2020-10-26 19:38:09 +02:00
Tamer Ahmed	6754635010	[cfggen] Make Jinja2 Template Python 3 Compatible Jinja2 templates rendered using Python 3 interpreter, are required to conform with Python 3 new semantics. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-30 07:07:43 -07:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Joe LeVeque	12c94a7431	[lldpmgrd] Inherit DaemonBase class from sonic-py-common package (#5370 ) Eliminate duplicate logging and signal handling code by inheriting from DaemonBase class in sonic-py-common package.	2020-09-15 10:55:55 -07:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
yozhao101	1c32933c7d	[docker] Correct the lldp-syncd program name in critical_process file. (#4862 ) The program name in critical_processes file must match the program name defined in supervisord.conf file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-28 11:08:30 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
Joe LeVeque	9b27efdcc2	[dockers] Rename 'docker-lldp-sv2' to 'docker-lldp' (#4700 ) The -sv2 suffix was used to differentiate SNMP Dockers when we transitioned from "SONiCv1" to "SONiCv2", about four years ago. The old Docker materials were removed long ago; there is no need to keep this suffix. Removing it aligns the name with all the other Dockers.	2020-06-09 09:09:56 -07:00
Shuotian Cheng	b6cc73a0ad	[dockers]: Remove deprecated docker-lldp and docker-snmp (#1068 ) Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>	2017-10-23 13:20:37 -07:00
Joe LeVeque	71d299bed4	[swsssdk]: Update nomenclature: 'sswsdk' -> 'swsssdk' (#445 )	2017-03-30 11:51:05 -07:00
pavel-shirshov	a845740543	[All Dockerfiles]: Prevent apt asking questions on the console (#300 ) Add noninteractive setting into every Dockerfile in the repo Signed-off-by: Pavel Shirshov pavelsh@microsoft.com	2017-02-16 21:48:49 -08:00
thomasbo	135ba232ca	SNMP/LLDP Containers: Sonic V2 Support (#41 ) * Adding support for V2 in SNMP/LLDP (-sv2 postfix) * Fixes for V1 containers: logging * Fixes for V1 LLDP: limit LLDP to Front-panel or MGMT interfaces.	2016-10-28 15:19:29 -07:00
Qi Luo	cc7f15094c	Squashed merge master	2016-09-09 17:53:41 -07:00
Qi Luo	e4bd20c18a	Squash merge master (11de390)	2016-08-04 10:39:33 -07:00

31 Commits