sonic-buildimage

Author	SHA1	Message	Date
yozhao101	e24fe9bc60	[Monit] Fix the issue which shows Monit can not reset its counter. (#10288 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com> Why I did it This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container. Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following: check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400" if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry" If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted. Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window. The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok. Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry: Program 'container_memory_telemetry' status Status ok monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Sat, 19 Mar 2022 19:56:26 Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service: Program 'container_memory_telemetry' status Status failed monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Tue, 01 Feb 2022 22:52:55 After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok. How I did it In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles. How to verify it I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.	2022-04-20 18:08:06 -07:00
xumia	7a226ffd0d	Support bullseye for docker-sonic-restapi docker-sonic-telemetry (#9791 ) Support bullseye for docker-sonic-restapi docker-sonic-telemetry Upgrade to bullseye and Golang-1.15 to support FIPS.	2022-01-21 08:41:39 +08:00
yozhao101	1a3cab43ac	[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-04 10:16:53 -07:00
yozhao101	37863ac854	[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold. How I did it I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container. How to verify it I verified this implementation on device str-7260cx3-acs-1.	2021-05-28 11:13:44 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
pra-moh	2e42ecb5e7	[StreamingTelemetry] add noTLS support for debug purpose (#6704 ) adding noTLS mode for debugging purpose Removing config-set for port 8080. It fails to start telemetry if docker restarts in case on noTLS mode because it expects log_level config to be present as well.	2021-02-17 17:23:00 -08:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Renuka Manavalan	ba02209141	First cut image update for kubernetes support. (#5421 ) * First cut image update for kubernetes support. With this, 1) dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled for kube management init_cfg.json configure set_owner as kube for these 2) Each docker's start.sh updated to call container_startup.py to register going up As part of this call, it registers the current owner as local/kube and its version The images are built with its version ingrained into image during build 3) Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'. For all locally managed containers, it calls docker commands, hence no change for locally managed. 4) Introduced a new ctrmgrd service, that helps with transition between owners as kube & local and carry over any labels update from STATE-DB to API server 5) hostcfgd updated to handle owner change 6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image. 7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.	2020-12-22 08:01:33 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
Tamer Ahmed	ceace4b605	[telemetry] Fix telemetry vars template path (#4938 ) The template is referenced relative to the script path and this could results in errors in case script is run from root. Add explicit path to the template file name. Also, moving telemetry_var template to template dir. And remove double quotes from around json dict. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-07-12 18:08:52 +00:00
Tamer Ahmed	f4eae5dabd	[telemetry] Call sonic-cfggen Once (#4901 ) sonic-cfggen call is slow and this is taking place in the SONiC boot up process. The change uses templates to assemble all required vars into single template file. With this change, telemetry now calls once into sonic-cfggen. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-07-12 18:08:52 +00:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
Guohan Lu	1cf417ed1b	[docker-telemetry]: use service dependency in supervisord to start services Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-05-22 11:01:28 -07:00
joyas-joseph	9084ac50fb	[docker-telemetry]: upgrade telemetry docker to buster (#4515 ) Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>	2020-05-19 03:12:50 -07:00
lguohan	60b16495cc	[docker-base-stretch]: move common packages into docker-base-stretch (#4371 ) libpython2.7, libdaemon0, libdbus-1-3, libjansson4 are common across different containers. move them into docker-base-stretch Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-04-05 13:29:34 -07:00
lguohan	a0d213cc37	[telemetry]: move default certs location from device metadata to telemetry (#4307 ) maintains backward compatibility to search original x509 location when telemetry table does not have certs Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-03-24 00:04:36 -07:00
RayWang910012	f90bf8fe62	[monit]: monit_telemetry which will have error when telemetry is in secure mode (#4286 ) When telemetry is in secure mode ,the monitor will have error log of the match string "--insecure". So I modify to be compatiable with insecure mode and secure mode. Co-authored-by: Ubuntu <ubuntu@ip-10-5-1-21.ap-south-1.compute.internal>	2020-03-21 18:48:17 -07:00
yozhao101	91e5fb5602	[Service] Enable/disable container auto-restart based on configuration. (#4073 )	2020-02-07 12:34:07 -08:00
Dong Zhang	5057ac3122	[MultiDB] (./dockers dir) : replace redis-cli with sonic-db-cli and use new DBConnector (#3923 ) * [MultiDB] (./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * remove unnecessary quota * update typo	2020-01-22 11:27:21 -08:00
yozhao101	b7e48b422f	[Services] Allow monit system tool to monitor the critical processes status running in various SONiC containers. (#3940 ) * Add a monit config file for teamd container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file in teamd container into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for snmp container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of snmp container into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for dhcp_relay container in the dir base_image_files. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of dhcp_relay container into base image under /etc/monit/conf.d. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a monit config file for router advertiser container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * Add a copy mechanism to put the monit config file of router advertiser contianer into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Pmon] Add a monit config file for pmon container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Pmon] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Add a monit config file for lldp container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Add a monit config file for BGP container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Add a copy mechanism to put monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Add a monit config file for the swss container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Add a copy mechanism to put monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on barefoot platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-centec] Add a monit config file for syncd container on centen platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on centen platform. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit conifg file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on marvell-armhf. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a monit config file for syncd container on nephos. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Add a monit config file for sflow container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Add a copy mechanism to put the monit conifg file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Add a monit config file for telemetry container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Add a monit config file for database container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Add a copy mechanism to put the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Dhcprelay] Change a typo. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Dhcprelay] Change the process name in monit config file to dhcrelay. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no desserve process in syncd container on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process desserve in syncd container on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process named desserve in syncd on centec. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] There is no process named desserve in syncd on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Should not delete the process desserve in syncd container on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on marvell-arm64. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on marvell-armhf. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Delete the process dsserve in syncd container on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-Radv] Change the process name to radvd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Correct a typo in monit_telemetry. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-teamd] Delete the monit config file for teamd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-teamd] Delete the mechanism to copy the monit config file into base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-dhcprelay] Delete the monit config file for dhcp_relay container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-dhcprelay] Delete the mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-radv] Delete the monit config file foe radv container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-radv] Delete the mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] change the monit config file for BGP container such that monit only generates alert if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-snmp] Change the monit config file for snmp container such that monit only generates alret if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Change the monit config file for pmon container such that monit only generates alert if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Change the monit config file for lldp container such that monit only generates alerts if some processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Delete the monit config file for pmon container since some of processes are not running depended on the type of box. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-pmon] Delete the copy mechanism to copy the monit config file into the base image. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Change the matching name for the process lldpd. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Change the monit config file for swss container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on barefoot such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Correct a typo in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on broadcom such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on cavium such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell-arm64 such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on marvell-armhf such that monit will generate alert if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Change the monit config file for syncd container on mellanox such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sycnd] Change the monit config file for syncd container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Change the monit config file for sflow container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Change the monit config file for telemetry container such that monit only generates alerts if the processes are not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Change the monit config file for database container such that monit only generates alerts if the process is not running for 5 minutes. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-database] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Use 4 spcess to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-lldp] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-swss] Use 4 spaces to replace 2 space in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-sflow] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-snmp] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-telemetry] Use 4 spaces to replace 2 spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on barefoot. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on broadcom. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on cavium. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on centec. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on marvell. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file on mellanox. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-syncd] Use 4 spaces to repalce 2 spaces in the monit config file on nephos. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [Docker-bgp] Remove the trailing extra spaces in monit config file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-01-10 16:21:02 -08:00
Prabhu Sreenivasan	87f70108cb	SONiC Management Framework Release 1.0 (#3488 ) * Added sonic-mgmt-framework as submodule / docker * fix build issues * update sonic-mgmt-framework submodule branch to master * Merged changes 70007e6d2ba3a4c0b371cd693ccc63e0a8906e77..00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 Modified environemnt variables * Changes to build sonic-mgmt-framework docker * bumped up sonic-mgmt-framework commit-id * version bump for sonic-mgmt-framework commit-it * bumped up sonic-mgmt-framework commit-id * Add python packages to docker * Build fix for docker with python packages * added libyang as dependent package * Allow building images on NFS-mounted clones Prior to this change, `build_debian.sh` would generate a Debian filesystem in `./fsroot`. This needs root permissions, and one of the tests that is performed is whether the user can create a character special file in the filesystem (using mknod). On most NFS deployments, `root` is the least privileged user, and cannot run mknod. Also, attempting to run commands like rm or mv as root would fail due to permission errors, since the root user gets mapped to an unprivileged user like `nobody`. This commit changes the location of the Debian filesystem to `/fsroot`, which is a tmpfs mount within the slave Docker. The default squashfs, docker tarball and zip files are also created within /tmp, before being copied back to /sonic as the regular user. The side effect of this change is that the contents of `/fsroot` are no longer available once the slave container exits, however they are available within the squashfs image. Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com> * bumped up sonc-mgmt-framework commit to include PR #18 * REST Server startup script is enahnced to read the settings from ConfigDB. Below table provides mapping of db field to command line argument name. ============================================================ ConfigDB entry key Field name REST Server argument ============================================================ REST_SERVER\|default port -port REST_SERVER\|default client_auth -client_auth REST_SERVER\|default log_level -v DEVICE_METADATA\|x509 server_crt -cert DEVICE_METADATA\|x509 server_key -key DEVICE_METADATA\|x509 ca_crt -cacert ============================================================ * Replace src/telemetry as submodule to sonic-telemetry * Update telemetry commit HEAD * Update sonic-telemetry commit HEAD * libyang env path update * Add libyang dependency to telemetry * Add scripts to create JSON files for CLI backend Scripts to create /var/platform/syseeprom and /var/platform/system, which are back-end files for CLI, for system EEPROM and system information. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * In startup script, create directory where CLI back-end files live Signed-off-by: Howard Persh <Howard_Persh@dell.com> * build dependency pkgs added to docker for build failure fix * Changes to fix build issue for mgmt framework * Fix exec path issue with telemetry * s5232[device] PSU detecttion and default led state support * Processing of first boot in rc.local should not have premature exit Signed-off-by: Howard Persh <Howard_Persh@dell.com> * docker mount options added for platform, system features * bumped up sonic-mgmt-framework commit id to pick 23rd July 2019 changes * Added mount options for telemetry docker to get access for system and platform info. * Update commit for sonic-utilities * [dell]: Corrected dport map and renamed config files for S5232F * Fix telemetry submodule commit * added support for sonic-cli console * [Dell S5232F, Z9264F] Harden FPGA driver kernel module For Dell S5232F and Z9264F platforms, be more strict when checking state in ISR of FPGA driver, to harden against spurious interrupts. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * update mgmt-framework submodule to 27th Aug commit. * remove changes not related to mgmt-framework and sonic-telemetry * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * make submodule changes and remove a change not related to PR * more changes * Update .gitmodules * Update Dockerfile.j2 * Update .gitmodules * Update .gitmodules * Update .gitmodules reverting experimental change * Removed syspoll for release_1.0 Signed-off-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> * Update docker-sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Revert "Processing of first boot in rc.local should not have premature exit" This reverts commit `e99a91ffc2`. * Remove old telemetry directory * Update docker-sonic-mgmt-framework.mk * Resolving merge conflict with Azure * Reverting the wrong merge * Use CVL_SCHEMA_PATH instead of changing directory for telemetry startup * Add missing export * Add python mmh3 to slave dockerfile * Remove sonic-mgmt-framework build dep for telemetry, fix dialout startup issues * Provided flag to disable compiling mgmt-framework * Update sonic-utilites point latest commit id * Point sonic-utilities to Azure accepted SHA * Updating mgmt framework to right sha * Add sonic-telemetry submodule * Update the mgmt-framework commit id Co-authored-by: jghalam <joe.ghalam@gmail.com> Co-authored-by: Partha Dutta <51353699+dutta-partha@users.noreply.github.com> Co-authored-by: srideepDell <srideep_devireddy@dell.com> Co-authored-by: nirenjan <nirenjan@users.noreply.github.com> Co-authored-by: Sachin Holla <51310506+sachinholla@users.noreply.github.com> Co-authored-by: Eric Seifert <seiferteric@gmail.com> Co-authored-by: Howard Persh <hpersh@yahoo.com> Co-authored-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> Co-authored-by: Arunsundar Kannan <31632515+arunsundark@users.noreply.github.com> Co-authored-by: rvasanthm <51932293+rvasanthm@users.noreply.github.com> Co-authored-by: Ashok Daparthi-Dell <Ashok_Daparthi@Dell.com> Co-authored-by: anand-kumar-subramanian <51383315+anand-kumar-subramanian@users.noreply.github.com>	2019-12-23 21:47:16 -08:00
pavel-shirshov	1848fb262b	[fast-reboot]: Save fast-reboot state into the db (#3741 ) Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.	2019-12-04 14:10:19 -08:00
pra-moh	65f7da87a7	[telemetry.sh] Fix string null check with special characters by adding quotes (#3810 ) * adding quotes for string comparison with special characters * Update dockers/docker-sonic-telemetry/telemetry.sh Co-Authored-By: Joe LeVeque <jleveque@users.noreply.github.com> * Update dockers/docker-sonic-telemetry/telemetry.sh Co-Authored-By: Joe LeVeque <jleveque@users.noreply.github.com>	2019-11-23 12:30:56 -08:00
yozhao101	df11b2b9f1	[Services] Restart Telemetry service upon unexpected critical process exit. (#3768 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-18 16:56:44 -08:00
Jipan Yang	9a8202a39d	[database]: Update redis to 5.0.3 (#3066 ) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2019-07-03 22:16:09 -07:00
Stepan Blyshchak	81cf33231f	[build]: Improve dockerfile instructions (#3048 ) - create a dockerfile-marcros.j2 file with all common operations written as j2 macro - use single dockerfile instruction for COPY and RUN commands when possible to improve build time - reorganize dockerfile instructions to make more cache friendly (in case someday we will remove --no-cache to build docker images) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-06-22 11:26:23 -07:00
Jipan Yang	7b81d4ddd6	[dockers]: Upgrade database and telemetry docker to stretch build (#2541 ) * Upgrade database and telemetry docker to stretch build Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Remove SONIC_STRETCH_DEBS list add for redis and telemetry Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2019-02-08 22:05:38 -08:00
lguohan	f3ca7c422f	[rsyslog]: use # to separate container name and program name in syslog message (#1918 ) Previously use / to separate container name and program name. However, in rsyslogd: Precisely, the programname is terminated by either (whichever occurs first): end of tag nonprintable character ‘:’ ‘[‘ ‘/’ The above definition has been taken from the FreeBSD syslogd sources. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2018-08-12 22:23:58 -07:00
Qi Luo	7ba08e5bf6	Prefix docker container name to syslog syslogtag (program name) (#1810 )	2018-06-25 10:48:42 -07:00
Jipan Yang	f74de8914b	[telemetry]: SONiC system telemetry Support (#1526 ) * SONiC system telemetry Support Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Update package name from telemetry to sonic-telemetry Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2018-03-27 13:39:04 -07:00

34 Commits