sonic-buildimage

Author	SHA1	Message	Date
Lawrence Lee	77378b4364	[mux]: Call write_standby from host only Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	25712c712e	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	84cd0e9471	[mux]: Initialize all mux ports as standby Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b8f70f8986	Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd Linkmgrd monitors link status, mux status, and link state. Has the link becomes unhealthy, linkmgrd will trigger mux switchover on a standby ToR ensuring uninterrupted service to servers/blades. This PR is initial implementation of linkmgrd. Also, docker-mux container hold packages related to maintaining and managing mux cable. It currently runs linkmgrd binary that monitor and switches the mux if needed. This PR also introduces mux-container and starts linkmgrd as startup when build is configured with INCLUDE_MUX=y Edit: linkmgrd PR will follow. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com> Related work items: #2315, #3146150	2021-11-10 18:54:33 -08:00
tjchadaga	9a1b1bc44e	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-09 23:20:06 +00:00
Lawrence Lee	8ada006302	[swss]: Start ndppd after vlanmgrd (#9155 ) Why I did it During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage. How I did it Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-05 00:39:10 +00:00
Saikrishna Arcot	bb1bc59a22	docker-dhcp-relay: Fix waiting for interfaces to get set up (#9034 ) Fix the check used to wait for interfaces to come up. The group name in the supervisor config files has changed from isc-dhcp-relay to dhcp-relay. Also, in the wait script, wait 10 additional seconds after the vlans, port channels, and any interfaces are up. This is because dhcrelay listens on all interfaces (in addition to port channels and vlans), and to ensure that it stays in a clean state during runtime, wait some extra time to make sure that those interfaces are created as well. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-10-22 17:14:22 +00:00
kellyyeh	d4a6a009cf	Change radv interval to 3min (#8891 ) (cherry picked from commit `0e175e6d6c`)	2021-10-01 23:00:17 -07:00
kellyyeh	a4b6788b4b	Replace isc-dhcp with DHCPv6 Relay in dhcp_relay docker (#8884 )	2021-10-01 19:55:03 -07:00
kellyyeh	47ba7a9091	[dhcp_relay] DHCP relay support for IPv6 (#7772 ) (#8871 )	2021-09-30 01:33:02 -07:00
Christian Svensson	5dce093464	[mgmt-framework]: Fix typo in mgmt_vars.j2 (#8475 ) Signed-off-by: Christian Svensson <blue@cmd.nu>	2021-08-25 04:11:16 +00:00
Kostiantyn Yarovyi	387ae82c5d	[Pcied] run by python 3 Why I did it Pcied running by python 2. How I did it dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2 How to verify it docker exec pmon supervisorctl status	2021-08-23 03:34:48 +00:00
xumia	b1c2659044	Support to build armhf/arm64 platforms on arm based system (#7731 ) (#8458 ) Why I did it Support to build armhf/arm64 platforms on arm based system without qemu simulator. When building the armhf/arm64 on arm based system, it is not necessary to use qemu simulator. How I did it Build armhf on armhf system, or build arm64 on arm64 system, by default, qemu simulator will not be used. When building armhf on arm64, and you have enabled armhf docker, then it will build images without simulator automatically. It is based how the docker service is run. Docker base image change: For amd64, change from debian:to amd64/debian: For arm64, change from multiarch/debian-debootstrap:arm64- to arm64v8/debian: For armhf, change from multiarch/debian-debootstrap:armhf- to arm32v7/debian: See https://github.com/docker-library/official-images#architectures-other-than-amd64 The mapping relations: arm32v6 --- armel arm32v7 --- armhf arm64v8 --- arm64 Docker image armhf deprecated info: https://hub.docker.com/r/armhf/debian, using arm32v7 instead.	2021-08-13 19:33:08 +08:00
richardyu	36ab000557	PTF adds unittest-xml-reporting (#8417 ) Co-authored-by: richardyu-ms <richard.yu@microsoft.com>	2021-08-12 07:09:58 +00:00
Sujin Kang	ae7fa32691	[pmon]: Enable Autorestart of the daemons in PMON for unexpected exit (#8358 ) Enable Autorestart of the daemons in PMON for unexpected exit Remove the daemon list from the critical_process which prevent the PMON from restarting when the individual daemon crashes.	2021-08-07 22:43:38 -07:00
Blueve	d2f2a07c7c	[ARM] Fix issue whre the ping6 tool is missing from orchagent docker (#8345 ) Signed-off-by: Jing Kan jika@microsoft.com	2021-08-05 15:25:53 +00:00
VenkatCisco	3aed7eab8f	[pmon]: add python3-jsonschema pmon (#8018 ) jsonschema is an implementation of JSON Schema for Python . Signed-off-by: Venkat Garigipati <venkatg@cisco.com>	2021-08-05 15:23:06 +00:00
novikauanton	08dc00f817	[iccpd][docker] fix initial startup configuration (#7982 ) #### Why I did it The process of config generation (sonic-cfggen) fails, but the services continue to run with invalid config #### How I did it * add exit with error on errors in start.sh script (because supervisord relies on start.sh return code). * fix jinja template. Jinja use common python expressions under the hood and `has_key` method was removed from dict in py3, so use check by `in` operator as it is supported by both py2 and py3. #### How to verify it * compile sonic with enabled iccp. * add mclag config to CONFIG_DB. ``` 'MC_LAG\|1' => { "local_ip": "10.0.0.2", "peer_ip": "10.0.0.3", "peer_link": "Ethernet8", "mclag_interface": "Ethernet12" } * unmaks, enable and start swss and iccpd services in sonic. * log in into the iccpd container and check the config file `/etc/iccpd/iccpd.conf` * expected config: ``` mclag_id:1 local_ip:10.0.0.2 peer_ip:10.0.0.3 peer_link:Ethernet8 mclag_interface:Ethernet12 system_mac:YOUR_SYSTEM_MAC #### Description for the changelog Fixed initial iccpd startup configuration.	2021-08-05 15:21:33 +00:00
Vivek Reddy	67202cc2bb	autorestart inside restapi docker is disabled (#8006 ) Fix issue with critical process in the restapi docker restarting immediately after getting killed Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2021-07-27 05:14:28 +00:00
Guohan Lu	bed4c26b09	Revert "Add ethtool to docker-platform-monitor (#8017 )" This reverts commit `d66425dd76`.	2021-07-07 23:37:28 -07:00
VenkatCisco	d66425dd76	Add ethtool to docker-platform-monitor (#8017 ) #### Why I did it ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices. #### How I did it add package extension to docker-platform-monitor/Dockerfile.j2	2021-07-07 09:40:11 +00:00
VenkatCisco	36d7dfbea3	Add libpci3 pkg to docker-platform-monitor (#8016 ) #### Why I did it The libpci library provides portable access to configuration registers of devices connected to the PCI bus. #### How I did it update dockers/docker-platform-monitor/Dockerfile.j2	2021-07-07 09:40:06 +00:00
thomas.cappleman@metaswitch.com	1d3e7ab161	[build]: Fix sonic-cfggen contextlib err (#7996 ) A recent version of contextlib2 (https://pypi.org/project/contextlib2/21.6.0/#history) has broken Python2 compatibility, so the version picked up by netaddr when using Python2 must be specified, or else builds fail Co-authored-by: Tom Zhu <tom.zhu@metaswitch.com>	2021-06-28 17:18:45 -07:00
Andriy Yurkiv	2fe91ae30f	Set default values only on the first start (#7735 )	2021-06-16 12:38:30 +00:00
bingwang-ms	c1b380df73	[docker-teamd]: Increase teammgrd timeout to allow graceful shutdown. (#7662 ) (#7842 ) The PR is a cherry-pick of #7662. Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2021-06-10 12:49:18 -07:00
yozhao101	fb2c995f53	[202012][Monit] Deprecate the feature of monitoring the critical processes by Monit (#7823 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-09 09:04:22 -07:00
Myron Sosyak	e7009513da	[docker-database] Fix Python3 issue (#7700 ) #### Why I did it To avoid the following error ``` Traceback (most recent call last): File "/usr/local/bin/flush_unused_database", line 10, in <module> if 'PONG' in output: TypeError: a bytes-like object is required, not 'str' ``` `communicate` method returns the strings if streams were opened in text mode; otherwise, bytes. In our case text arg in Popen is not true and that means that `communicate` return the bytes #### How I did it Set `text=True` to get strings instead of bytes #### How to verify it run `/usr/local/bin/flush_unused_database` inside database container	2021-06-02 02:39:31 +00:00
bingwang-ms	eb8c05c306	Fix lldpmgrd syntax issue (#7742 ) Signed-off-by: bingwang <bingwang@microsoft.com>	2021-06-02 02:39:31 +00:00
Lawrence Lee	6a0e9078d4	[docker-orchagent]: Increase ndppd kernel poll interval (#7456 ) Why I did it ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds How I did it Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds How to verify it Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-06-02 02:38:54 +00:00
yozhao101	3af05fdffe	[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold. How I did it I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container. How to verify it I verified this implementation on device str-7260cx3-acs-1.	2021-05-31 04:38:18 +00:00
bingwang-ms	c5d27750f2	Fix supervisor-proc-exit-listener startup issue in restapi (#7681 ) * Fix supervisor-proc-exit-listener startup issue in restapi Signed-off-by: bingwang <bingwang@microsoft.com>	2021-05-27 22:29:42 +00:00
LuiSzee	bc5b367d37	[radv] fix bug for radv can't startup if DEVICE_METADATA.localhost.type is NULL (#7651 ) Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-05-26 02:39:02 +00:00
Myron Sosyak	4ec13fd556	Fix python version (#7658 ) #### Why I did it To avoid the following logs ``` Mar 15 15:52:04.599302 igk-dut-04 INFO database#/supervisord: flushdb /bin/bash: /usr/local/bin/flush_unused_database: /usr/bin/python: bad interpreter: No such file or directory Mar 15 15:52:04.599947 igk-dut-04 INFO database#supervisord 2021-03-15 15:52:04,599 INFO exited: flushdb (exit status 126; not expected) ``` #### How I did it Fix shebang #### How to verify it Check the logs	2021-05-24 22:25:47 +00:00
xumia	2973a63f1d	Fix the type issue in rvtysh (#7648 ) Why I did it Change the type issue in the command rvtysh change PARA/para to PARAM/param	2021-05-24 22:25:47 +00:00
sudhanshukumar22	0a5551aabf	docker-lldp:intermittent DB errors will result in Client termination (#6119 ) This PR allows listen to hostname changes and mgmt ip changes.	2021-05-24 22:21:29 +00:00
abdosi	dbded1f48e	Changes in FRR temapltes for multi-asic (#6901 ) 1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic. 2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output. 3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27 4. Enhancement to add mult_asic specific bgpd template generation unit test cases.	2021-05-24 21:59:57 +00:00
abdosi	8f6b3456ab	[multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848 ) Enable BBR config allowas-in 1 for internal peers Why I did: To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.	2021-05-24 21:57:10 +00:00
VenkatCisco	91b4ce649e	[pmon]: add psmisc to bring fuser that dentifies processes that are using files or sockets (#7509 ) fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.	2021-05-10 16:00:43 -07:00
Junchao-Mellanox	6e12c40f40	[Mellanox] Support new sensor conf file for MSN4700 A1/A0 (#7535 ) #### Why I did it MSN4700 A1/A0 used different sensor chip but keep the existing platform name x86_64-mlnx_msn4700-r0, this is a workaround to replace the sensor conf on MSN4700 A1/A0 #### How I did it Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf	2021-05-10 09:21:42 -07:00
trzhang-msft	d76206bae4	dhcpmon: support dual tor in docker template (#7470 )	2021-05-05 09:34:42 -07:00
judyjoseph	8cea931cad	Fixes for errors seen in staging devices (#7171 ) With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix. admin@STG01-0101-0102-01T1:~$ TSB BGP0 : % Could not find route-map entry TO_TIER0_V4 20 line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20 % Could not find route-map entry TO_TIER0_V4 30 line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30 In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.	2021-05-03 13:19:29 -07:00
judyjoseph	7ae4a990e7	[docker-fpm-frr]: TSA/B/C changes for multi-asic (#6510 ) - Introduced TS common file in docker as well and moved common functions. - TSA/B/C scripts run only in BGP instances for front end ASICs. In addition skip enforcing it on route maps used between internal BGP sessions. admin@str--acs-1:~$ sudo /usr/bin/TSA System Mode: Normal -> Maintenance and in case of Multi-ASIC admin@str--acs-1:~$ sudo /usr/bin/TSA BGP0 : System Mode: Normal -> Maintenance BGP1 : System Mode: Normal -> Maintenance BGP2 : System Mode: Normal -> Maintenance	2021-05-03 13:19:17 -07:00
guxianghong	a0fde3a626	[arm] support compile sonic arm image on arm server (#7285 ) - Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly. - Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228 - rename multiarch docker to sonic-slave-${distro}-march-${arch} Co-authored-by: Xianghong Gu <xgu@centecnetworks.com> Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-05-02 08:11:56 -07:00
xumia	1b05982727	Support readonly vtysh for sudoers (#7383 ) Why I did it Support readonly version of the command vtysh How I did it Check if the command starting with "show", and verify only contains single command in script.	2021-04-29 10:08:55 -07:00
kakkotetsu	e6bbb3c344	[restapi] fix python version during restapi startup (#7056 ) changed from python3 to python in supervisord.conf.	2021-04-22 14:36:09 -07:00
Vivek Reddy Karri	731401fe4f	Reverts the commit which reverts "Backport ethtool to support QSFP-DD (#5725 )" This reverts commit `a86cdd87cf`.	2021-04-15 19:15:58 -07:00
Stephen Sun	3cee45c298	[monit] Avoid monit error log by removing "-l" from monit_swss\|buffermgrd (#7236 ) Avoid the following error messages while dynamic buffer calculation is enabled ``` ERR monit[491]: 'swss\|buffermgrd' status failed (1) -- '/usr/bin/buffermgrd -l' is not running in host ``` Change /usr/bin/buffermgrd -l to /usr/bin/buffermgrd. The buffermgrd is started by -l for traditional model or -a for dynamic model. So we need to use the common section of both. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-04-08 18:39:10 +00:00
Prince Sunny	e08dc12acf	[IPinIP] Add Loopback2 interface, change dscp mode to uniform (#7234 ) Co-authored-by: Ubuntu <prsunny>	2021-04-08 18:38:59 +00:00
Guohan Lu	a86cdd87cf	Revert "Backport ethtool to support QSFP-DD (#5725 )" This reverts commit `50e4cc1579`.	2021-04-01 13:11:15 -07:00
Joe LeVeque	dd9be59cd1	[202012][dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7203 ) #### Why I did it Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 202012 branch. To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-04-01 12:52:19 -07:00

1 2 3 4 5 ...

829 Commits