sonic-buildimage

Author	SHA1	Message	Date
mssonicbld	6891aa915a	[ci/build]: Upgrade SONiC package versions (#13017 )	2022-12-11 22:19:51 +08:00
mssonicbld	e428afae01	[ci/build]: Upgrade SONiC package versions (#13015 )	2022-12-10 22:16:58 +08:00
lixiaoyuner	b0c9013ea1	Add k8s master feature (#11637 ) (#12984 ) Signed-off-by: Yun Li <yunli1@microsoft.com> * Add k8s master feature * Update kubernetes version mistake and make variable passing clear * Add CRI-dockerd package * Update version variable passing logic * Upgrade the worker kubernetes version * Install xml file parse tool	2022-12-09 10:43:54 +08:00
Stepan Blyshchak	7ed1cd0d68	[services] kill container on stop in warm/fast mode (#10510 ) - Why I did it To optimize stop on warm boot. - How I did it Added kill for containers	2022-12-08 17:19:16 +00:00
Michael Li	41858170d8	Limit reload BCM SDK kmods on syncd start to PikeZ platform (#12971 ) Why I did it Limiting #12804 changes to PikeZ platform only (Arista-720DT-48S). Note that this is a short term workaround for this platform until SDK investigation on SDK init failure on docker syncd restart due to DMA issues is resolved. How I did it Retrieve platform name from /host/machine.conf and only reload SDK kmods on Arista-720DT-48S platform. Signed-off-by: Michael Li <michael.li@broadcom.com>	2022-12-08 17:18:00 +00:00
Ying Xie	7da66c2943	Revert "Revert "Reload BCM SDK kmods on syncd start to handle syncd restart issues (#12804 )"" This reverts commit `7e910aecad`.	2022-12-08 17:17:41 +00:00
Stepan Blyshchak	699800bdf1	[swss.sh] optimize macsec feature state query (#12946 ) - Why I did it There's a slowdown in bootup related to the execution of a show command during startup of swss service. show is a pretty heavy command and takes long time to execute ~2 sec. - How I did it I replaced show with sonic-db-cli which takes a ms to run. - How to verify it Boot the switch and verify swss is active. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-12-08 04:32:54 +08:00
mssonicbld	7152e84277	Make client indentity by AME cert (#11946 ) (#12908 )	2022-12-02 13:13:26 +08:00
Ying Xie	7e910aecad	Revert "Reload BCM SDK kmods on syncd start to handle syncd restart issues (#12804 )" This reverts commit `132c6e934a`.	2022-12-01 19:47:33 +00:00
Michael Li	132c6e934a	Reload BCM SDK kmods on syncd start to handle syncd restart issues (#12804 ) Why I did it There is an issue on the Arista PikeZ platform (using T3.X2: BCM56274) while running SONiC. If the 'syncd' container in SONiC is restarted, the expected behaviour is that syncd will automatically restart/recover; however it does not and always fails at create_switch due to BCM SDK kmod DMA operation cancellation getting stuck. Sep 16 22:19:44.855125 pkz208 ERR syncd#syncd: [none] SAI_API_SWITCH:platform_process_command:428 Platform command "init soc" failed, rc = -1. Sep 16 22:19:44.855206 pkz208 INFO syncd#supervisord: syncd CMIC_CMC0_PKTDMA_CH4_DESC_COUNT_REQ:0x33#015 Sep 16 22:19:44.855264 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:platformInit:1909 initialization command "init soc" failed, rc = -1 (Internal error). Sep 16 22:19:44.855403 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:sai_driver_init:642 Error initializing driver, rc = -1. ... Sep 16 22:19:44.855891 pkz208 CRIT syncd#syncd: [none] SAI_API_SWITCH:brcm_sai_create_switch:1173 initializing SDK failed with error Operation failed (0xfffffff5). Reloading the BCM SDK kmods allows the switch init to continue properly. How I did it If BCM SDK kmods are loaded, unload and load them again on syncd docker start script. How to verify it Steps to reproduce: In SONiC, run 'docker ps' to see current running containers; 'syncd' should be present. Run 'docker stop syncd' Wait ~1 minute. Run 'docker ps' to see that syncd is missing. Check logs to see messages similar to the above. Signed-off-by: Michael Li <michael.li@broadcom.com>	2022-12-01 01:36:18 +00:00
abdosi	81fe1d9c1a	Added Support to runtime render bgp and teamd feature state and lldp has_asic_scope flag (#11796 ) (#12856 ) Added Support to runtime render bgp and teamd feature state and lldp has_asic_scope flag	2022-11-29 13:47:37 -08:00
bingwang-ms	4f7a0b4705	Apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP to uplink ports on dualtor (#12730 ) Why I did it The PR is to apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP to uplink ports on dualtor. The traffic with DSCP 2 and DSCP 6 from T1 is treated as lossless traffic. DSCP TC Queue 2 2 2 6 6 6 Traffic with DSCP 2 or DSCP 6 from downlink is still treated as lossy traffic as before. How I did it Define DSCP_TO_TC_MAP\|AZURE_UPLINK and TC_TO_QUEUE_MAP\|AZURE_UPLINK. How to verify it Verified by UT Verified by coping the new template to a testbed, and rendering a config_db.json	2022-11-28 18:51:04 +00:00
Lorne Long	5a4efe211c	[Build] Use apt-get to predictably support dependency ordered configuration of lazy packages (#12164 ) Why I did it The current lazy installer relies on a filename sort for both unpack and configuration steps. When systemd services are configured [started] by multiple packages the order is by filename not by the declared package dependencies. This can cause the start order of services to differ between first-boot and subsequent boots. Declared systemd service dependencies further exacerbate the issue (e.g. blocking the first-boot script). The current installer leaves packages un-configured if the package dependency order does not match the filename order. This also fixes a trivial bug in [Build]: Support to use symbol links for lazy installation targets to reduce the image size #10923 where externally downloaded dependencies are duplicated across lazy package device directories. How I did it Changed the staging and first-boot scripts to use apt-get: dpkg -i /host/image-$SONIC_VERSION/platform/$platform/.deb becomes apt-get -y install /host/image-$SONIC_VERSION/platform/$platform/.deb when dependencies are detected during image staging. How to verify it Apt-get critical rules Add a Depends= to the control information of a package. Grep the syslog for rc.local between images and observe the configuration order of packages change.	2022-11-28 18:48:36 +00:00
abdosi	88bb83e859	[chassis-packet] fix the issue of internal ip arp not getting resolved. (#12127 ) Fix the issue where arp_update will not ping some of the ip's even though they are in failed state since grep of that ip on ip neigh show command does not do exact word match and can return multiple match.	2022-11-28 18:48:36 +00:00
arlakshm	b86b3b0d7d	[202205][chassis] update the asic_status.py to read from CHASSIS_FABRIC_ASIC_INFO_TABLE (#12780 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2022-11-26 20:27:30 -08:00
mssonicbld	66ba3285ac	[ci/build]: Upgrade SONiC package versions (#12830 )	2022-11-25 21:59:36 +08:00
mssonicbld	4424937611	[ci/build]: Upgrade SONiC package versions (#12812 )	2022-11-23 21:35:21 +08:00
mssonicbld	2d2305091f	[ci/build]: Upgrade SONiC package versions (#12772 )	2022-11-20 22:50:40 +08:00
mssonicbld	13b8078555	[ci/build]: Upgrade SONiC package versions (#12759 )	2022-11-19 04:22:29 +08:00
mssonicbld	f4bace99f1	[ci/build]: Upgrade SONiC package versions (#12726 )	2022-11-17 02:52:28 +08:00
mssonicbld	1be9baa1c0	[ci/build]: Upgrade SONiC package versions (#12691 )	2022-11-13 22:33:45 +08:00
mssonicbld	2b641e0505	[ci/build]: Upgrade SONiC package versions (#12656 )	2022-11-11 23:34:54 +08:00
Jing Kan	b2d3e2cf2e	[dhcp_relay] Enable DHCP Relay for BmcMgmtToRRouter in init_cfg (#12648 ) Why I did it DHCP relay feature needs to be enabled for BmcMgmtToRRouter by default How I did it Update device type list	2022-11-10 18:16:15 +00:00
Sudharsan Dhamal Gopalarathnam	1ea37e2723	[logrotate]Fix logrotate firstaction script to reflect correct size (#12599 ) - Why I did it Fix logrotate firstaction script to reflect correct size. The size was modified to change dynamically based on disk size. However this variable was not updated #9504 - How I did it Updated the variable based on disk size - How to verify it Verify in the generated rsyslog file if the variable is correctly generated from jinja template	2022-11-10 18:15:10 +00:00
bingwang-ms	d824846928	Add lossy scheduler for queue 7 (#12596 ) * Add lossy scheduler for queue 7	2022-11-10 18:14:55 +00:00
Devesh Pathak	c7ce62154b	Clear /etc/resolv.conf before building image (#12592 ) Why I did it nameserver and domain entries from build system fsroot gets into sonic image. How I did it Clear /etc/resolv.conf before building image How to verify it Built image with it and verified with install that /etc/resolv.conf is empty	2022-11-10 18:14:10 +00:00
Lawrence Lee	f60e22a5c3	[arp_update]: Fix hardcoded vlan (#12566 ) Typo in prior PR #11919 hardcodes Vlan name. Change command to use the $vlan variable instead Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-11-10 18:12:02 +00:00
judyjoseph	ab713dcfb6	Use the macsec_enabled flag in platform to enable macsec feature state (#11998 ) * Use the macsec_enabled flag in platform to enable macesc feature state * Add macsec supported metadata in DEVICE_RUNTIME_METADATA	2022-11-10 18:08:42 +00:00
mssonicbld	584aaa7058	[ci/build]: Upgrade SONiC package versions (#12612 )	2022-11-06 22:25:30 +08:00
mssonicbld	98c3e24770	[ci/build]: Upgrade SONiC package versions (#12606 )	2022-11-05 00:36:02 +08:00
mssonicbld	1463af1227	[ci/build]: Upgrade SONiC package versions (#12584 )	2022-11-03 00:12:56 +08:00
mssonicbld	fe62175aa6	[ci/build]: Upgrade SONiC package versions (#12571 )	2022-11-02 01:18:10 +08:00
mssonicbld	ae681eabb8	[ci/build]: Upgrade SONiC package versions (#12556 )	2022-11-01 03:49:12 +08:00
mssonicbld	483257d88c	[ci/build]: Upgrade SONiC package versions (#12543 )	2022-10-28 23:15:39 +08:00
Samuel Angebault	8e44292d74	[202205][Arista] Fix cmdline generation during warm-reboot from 201811/201911 (#12371 ) * [202012][Arista] Fix cmdline generation during warm-reboot from 201811/201911 (#11161) Issue fixed: when performing a warm-reboot or fast-reboot from 201811 or 201911 to 202012 the kernel command line contains duplicate information. This issue is related to a change that was made to make 202012 boot0 file more futureproof. A cold reboot brings everything back into a clean slate though not always desirable. Changes done: Added some logic to properly detect the end of the Aboot cmdline when cmdline-aboot-end delimiter is not set (clean case) Added some logic to regenerate the Aboot cmdline when cmdline-aboot-end is set but duplicate parameters exists before (dirty case). Reorganized some code to handle duplicate parameter handling in the allowlist. * Fix cmdline generation due to sonic_fips	2022-10-27 10:14:26 -07:00
Samuel Angebault	b1c0d8d5e4	Add emmc quirks to boot0 (#9989 ) (#12373 ) Why I did it Fix some unreliability seen on emmc device with some AMD CPUs How I did it Added a kernel parameter to add quirks to It depends on a sonic-linux-kernel change to work properly but will be a no-op without it. Description for the changelog Add emmc quirks for Upperlake	2022-10-27 07:09:03 -07:00
Devesh Pathak	17c213a264	Fix to improve hostname handling (#12064 ) * Fix to improve hostname handling If config_db.json is missing hostname entry, hostname-config.sh ends up deleting existing entry too and hostname changes to default 'localhost' * default hostname to 'sonic` if missing in config file	2022-10-25 21:52:42 +00:00
Samuel Angebault	94c8107f5e	Fix extraction of platform.tar.gz for firsttime (#11935 )	2022-10-25 20:43:32 +00:00
cytsao1	8930d70972	[pmon] Add smartmontools to pmon docker (#11837 ) * Add smartmontools to pmon docker * Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files * Add comments on smartmontools version for both host and pmon	2022-10-25 20:41:26 +00:00
xumia	db2128564b	[202205] Change submodule path from Azure to sonic-net (#12308 ) Why I did it Change the path of sonic submodules that point to "Azure" to point to "sonic-net" How I did it Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules	2022-10-24 13:13:14 +08:00
mssonicbld	abc92c6248	[ci/build]: Upgrade SONiC package versions (#12452 )	2022-10-20 03:23:45 +08:00
mssonicbld	5d2db5068c	[ci/build]: Upgrade SONiC package versions (#12437 )	2022-10-18 22:19:35 +08:00
mssonicbld	cfc9af71ef	[ci/build]: Upgrade SONiC package versions (#12418 )	2022-10-16 22:24:10 +08:00
mssonicbld	b4e6a06d1a	[ci/build]: Upgrade SONiC package versions (#12409 )	2022-10-14 23:51:03 +08:00
Ying Xie	a1365b44c3	[BGP] starting BGP service after swss (#12381 ) Why I did it BGP service has always been starting after interface-config. However, recently we discovered an issue where some BGP sessions are unable to establish due to BGP daemon not able to read the interface IP. This issue was clearly observed after upgrading to FRR 8.2.2. See more details in #12380. How I did it Delaying starting BGP seems to be a workaround for this issue. However, caution is that this delay might impact warm reboot timing and other timing sequences. This workaround is reducing the probability of hitting the issue by close to 100X. However, this workaround is not bulletproof as test shows. It is still preferrable to have a proper FRR fix and revert this change in the future. How to verify it Continuously issuing config reload and check BGP session status afterwards. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2022-10-13 16:34:10 +00:00
mssonicbld	3435a8a305	[ci/build]: Upgrade SONiC package versions (#12372 )	2022-10-13 02:58:26 +08:00
mssonicbld	1b5d61246a	[ci/build]: Upgrade SONiC package versions (#12324 )	2022-10-09 21:44:14 +08:00
Stepan Blyshchak	06f8b1f98a	[auto-ts] add memory check (#10433 ) (#12291 ) #### Why I did it To support automatic techsupport invokation in case memory usage is too high. #### How I did it Implemented according to https://github.com/Azure/SONiC/pull/939 #### How to verify it UT, manual test on the switch. DEPENDS on https://github.com/Azure/sonic-utilities/pull/2116	2022-10-06 08:06:46 -07:00
Prince George	fab37239dd	Disable brackted-paste mode off by default (#12285 ) * Disable brackted-paste mode off by default * address review comment	2022-10-06 14:58:46 +00:00
Saikrishna Arcot	ac19e2a8ba	[docker-wait-any]: Exit worker thread if main thread is expected to exit (#12255 ) There's an odd crash that intermittently happens after the teamd container exits, and a signal is raised to the main thread to exit. This thread (watching teamd) continues execution because it's in a `while True`. The subsequent wait call on the teamd container very likely returns immediately, and it calls `is_warm_restart_enabled` and `is_fast_reboot_enabled`. In either of these cases, sometimes, there is a crash in the transition from C code to Python code (after the function gets executed). Python sees that this thread got a signal to exit, because the main thread is exiting, and tells pthread to exit the thread. However, during the stack unwinding, _something_ is telling the unwinder to call `std::terminate`. The reason is unknown. This then results in a python3 SIGABRT, and systemd then doesn't call the stop script to actually stop the container (possibly because the main process exited with a SIGABRT, so it's a hard crash). This means that the container doesn't actually get stopped or restarted, resulting in an inconsistent state afterwards. The workaround appears to be that if we know the main thread needs to exit, just return here, and don't continue execution. This at least tries to avoid it from getting into the problematic code path. However, it's still feasible to get a SIGABRT, depending on thread/process timings (i.e. teamd exits, signals the main thread to exit, and then syncd exits, and syncd calls one of the two C functions, potentially hitting the issue). Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-10-06 14:57:53 +00:00
mssonicbld	204cf58221	[ci/build]: Upgrade SONiC package versions (#12278 )	2022-10-05 20:38:20 +08:00
Ying Xie	76f7d7fa53	Revert "[auto-ts] add memory check (#10433 )" This reverts commit `a2cd0f5d4c`.	2022-10-04 21:53:45 +00:00
mssonicbld	1a08069d40	[ci/build]: Upgrade SONiC package versions (#12268 )	2022-10-04 21:09:24 +08:00
Stepan Blyshchak	a2cd0f5d4c	[auto-ts] add memory check (#10433 ) #### Why I did it To support automatic techsupport invokation in case memory usage is too high. #### How I did it Implemented according to https://github.com/Azure/SONiC/pull/939 #### How to verify it UT, manual test on the switch. DEPENDS on https://github.com/Azure/sonic-utilities/pull/2116	2022-10-03 18:58:38 +00:00
mssonicbld	89643d4717	[ci/build]: Upgrade SONiC package versions (#12245 )	2022-10-02 21:13:07 +08:00
mssonicbld	a7d088c47c	[ci/build]: Upgrade SONiC package versions (#12191 )	2022-09-28 23:25:55 +08:00
mssonicbld	1c5abca0a6	[ci/build]: Upgrade SONiC package versions (#12187 )	2022-09-27 08:41:31 +08:00
mssonicbld	99f9c53d19	[ci/build]: Upgrade SONiC package versions (#12142 )	2022-09-25 21:57:18 +08:00
Volodymyr Boiko	3d620370f7	[bgp][service] Start bgp service after interfaces-config service (#11827 ) - Why I did it interfaces-config service restarts networking service, during the restart loopback interface address is being removed and reassigned back, leaving loopback without an ipv4 address for a while. On SONiC startup and config reload interfaces-config and bgp services start in parallel and sometimes fpmsyncd in bgp attempts bind to loopback while it does not have an address, fails with the log Exception "Cannot assign requested address" had been thrown in daemon and exits with rc 0. root@sonic:/# supervisorctl status fpmsyncd EXITED Jul 20 05:04 AM zebra RUNNING pid 35, uptime 6:15:05 zsocket EXITED Jul 20 05:04 AM docker logs bgp INFO exited: fpmsyncd (exit status 0; expected) With fpmsyncd dead, configured routes do not appear in the database. - How I did it Added ordering dependency on interfaces-config service into bgp.config - How to verify it Itself the issue reproduces quite rarely, but one can gain the time interval between networking down and networking up in interfaces-config.sh like this: diff --git a/files/image_config/interfaces/interfaces-config.sh b/files/image_config/interfaces/interfaces-config.sh index f6aa4147a..87caceeff 100755 --- a/files/image_config/interfaces/interfaces-config.sh +++ b/files/image_config/interfaces/interfaces-config.sh @@ -63,7 +63,11 @@ done # Read sysctl conf files again sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf -systemctl restart networking +# systemctl restart networking + +systemctl start networking +sleep 10 +systemctl stop networking # Clean-up created files rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json with this change the issue reproduces on every config reload. Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>	2022-09-21 21:15:08 +00:00
Maxime Lorrillere	458b12b4af	[Chassis][Voq]Configure midplane network on supervisor (#11725 ) Multi-asic Docker instances are created behind Docker's default bridge which doesn't allow talking to other Docker instances that are in the host network (like database-chassis). On linecards, we configure midplane interfaces to let per-asic docker containers talk to CHASSIS_DB on the supervisor through internal chassis network. On the supervisor we don't need to use chassis internal network, but we still need a similar setup in order to allow fabric containers to talk to database-chassis	2022-09-21 21:12:40 +00:00
mssonicbld	77b469d7c8	[ci/build]: Upgrade SONiC package versions (#12121 )	2022-09-20 21:24:25 +08:00
Oleksandr Ivantsiv	c9ba827773	[202205] [services] Update "WantedBy=" section for tacacs-config.timer. (#11893 ) (#12080 ) Manually cherry-picking #11893 - Why I did it The timer execution may fail if triggered during a config reload (when the sonic.target is stopped). This might happen in a rare situation if config reload is executed after reboot in a small time slot (for 0 to 30 seconds) before the tacacs-config timer is triggered: systemctl status tacacs-config.timer tacacs-config.timer - Delays tacacs apply until SONiC has started Loaded: loaded (/lib/systemd/system/tacacs-config.timer; enabled-runtime; vendor preset: enabled) Active: failed (Result: resources) since Mon 2022-08-29 15:53:03 IDT; 1min 28s ago Trigger: n/a Triggers: tacacs-config.service Aug 29 15:47:53 r-boxer-sw01 systemd[1]: Started Delays tacacs apply until SONiC has started. Aug 29 15:53:03 r-boxer-sw01 systemd[1]: tacacs-config.timer: Failed to queue unit startup job: Transaction for tacacs-config.service/start is destructive (mgmt-framework.timer has 's> Aug 29 15:53:03 r-boxer-sw01 systemd[1]: tacacs-config.timer: Failed with result 'resources'. - How I did it To ensure that timer execution will be resumed after a config reload the WantedBy section of the systemd service is updated to describe relation to sonic.target. - How to verify it Reboot the system After reboot monitor tacacs-config.timer status. 30 seconds before timer activation run "config reload -y" command. Check system status. Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>	2022-09-19 09:20:10 +03:00
mssonicbld	f361c029c5	[ci/build]: Upgrade SONiC package versions (#11980 )	2022-09-19 12:31:16 +08:00
Aryeh Feigin	b8c6e2a45d	Use warm-boot infrastructure for fast-boot (#12026 )	2022-09-14 21:23:34 +03:00
Saikrishna Arcot	f1243bad1b	Pin version of bazelisk to v1.13.0 (#12027 ) * Pin version of bazelisk to v1.13.0 This tries to avoid builds failures due to the latest version of bazelisk changing and causing hash mismatches. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-09-08 21:15:35 -07:00
Ying Xie	ee40402ab7	Revert "[build] Fix version of bazelist which is lost acccidently (#12012 )" This reverts commit `36c5787daf`.	2022-09-09 04:14:59 +00:00
Liu Shilong	36c5787daf	[build] Fix version of bazelist which is lost acccidently (#12012 ) Why I did it bazelisk package with hash value 1227b24db77557d552701f6add122edc is deleted from github release. Reproducible build only cached hash value. Package file didn't be cached. Because they are in different pipelines. Using latest package hash instead.	2022-09-09 07:24:44 +08:00
Ze Gan	0a54c46a0d	[docker-macsec]: Add dependencies of MACsec (#11770 ) Why I did it If the SWSS services was restarted, the MACsec service should also be restarted. Otherwise the data in wpa_supplicant and orchagent will not be consistent. How I did it Add dependency in docker-macsec.mk. How to verify it Manually check by 'sudo service swss restart'. The MACsec container should be started after swss, the syslog will look like Sep 8 14:36:29.562953 sonic INFO swss.sh[9661]: Starting existing swss container with HWSKU Force10-S6000 Sep 8 14:36:30.024399 sonic DEBUG container: container_start: BEGIN ... Sep 8 14:36:33.391706 sonic INFO systemd[1]: Starting macsec container... Sep 8 14:36:33.392925 sonic INFO systemd[1]: Starting Management Framework container... Signed-off-by: Ze Gan <ganze718@gmail.com>	2022-09-08 15:50:06 +00:00
Ying Xie	b4bf4aca3f	[mux] skip mux operations during warm shutdown (#11937 ) * [mux] skip mux operations during warm shutdown - Enhance write_standby.py script to skip actions during warm shutdown. - Expand the support to BGP service. - MuX support was added by a previous PR. - don't skip action during warm recovery Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2022-09-08 15:48:56 +00:00
Lawrence Lee	12e6b89d80	[arp_update]: Set failed IPv6 neighbors to incomplete (#11919 ) After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-09-08 15:48:05 +00:00
Stepan Blyshchak	8431d3ab36	[docker-wait-any] immediately start to wait (#11595 ) It could happen that a container has already crashed but docker-wait-any will wait forever till it starts. It should, however, immediately exit to make the serivce restart. #### Why I did it It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in: ``` CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1abef1ecebff bcbca2b74df6 "/usr/local/bin/supe…" 22 hours ago Up 22 hours what-just-happened 3c924d405cd5 docker-lldp:latest "/usr/bin/docker-lld…" 22 hours ago Up 22 hours lldp eb2b12a98c13 docker-router-advertiser:latest "/usr/bin/docker-ini…" 22 hours ago Up 22 hours radv d6aac4a46974 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours mgmt-framework d880fd07aab9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours pmon 75f9e22d4fdd docker-snmp:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours snmp 76d570a4bd1c docker-sonic-telemetry:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours telemetry ee49f50344b3 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours syncd 1f0b0bab3687 docker-teamd:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours teamd 917aeeaf9722 docker-orchagent:latest "/usr/bin/docker-ini…" 22 hours ago Exited (0) 22 hours ago swss 81a4d3e820e8 docker-fpm-frr:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours bgp f6eee8be282c docker-database:latest "/usr/local/bin/dock…" 22 hours ago Up 22 hours database ``` The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot: `d01a91a569/files/image_config/misc/docker-wait-any (L56)` #### How I did it Removed the check for ```"Running"```. #### How to verify it Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.	2022-09-08 15:47:27 +00:00
mssonicbld	dc987ebd2c	[ci/build]: Upgrade SONiC package versions (#11951 )	2022-09-05 14:42:32 +08:00
mssonicbld	613d3431d1	[ci/build]: Upgrade SONiC package versions (#11913 ) Upgrade SONiC Versions	2022-09-01 15:47:48 +08:00
abdosi	72852cdd02	Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (#11857 ) As part of PR #11754 Change was added to use variable SONIC_DB_NS_CLI for namespace but that will not work since ./files/scripts/syncd_common.sh uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new variable for SONIC_GLOBAL_DB_CLI for global/host db cli access Also fixed DB_CLI not working for namespace.	2022-09-01 00:12:56 +00:00
Longxiang Lyu	d7f049ebf0	[mux] Exit to write `standby` state to `active-active` ports (#11821 ) [mux] Exit to write standby state to `active-active` ports Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2022-09-01 00:11:09 +00:00
andywongarista	0adfd724e6	[202205][Arista] Add initial support for 720DT-48S (#10656 ) (#11860 ) Added initial set of config files to allow for booting and partial traffic testing in SONiC on the 720DT-48S. How to verify it - Switch boots - show interfaces status shows links up on interfaces Ethernet24-51 - Traffic flows with no errors on interfaces Ethernet24-51	2022-08-30 12:39:26 +08:00
Stepan Blyshchak	c60d78dd1f	[syncd.sh] 'sxdkernel start' => 'sxdkernel restart' (#11718 ) Change `sxdkernel start` to `sxdkernel restart`. If `syncd` service crashes in `ExecStartPre` systemd will not call `ExecStop` and thus will not call `sxdkernel stop`. Use of `sxdkernel restart` is more robust in terms of guarantees to restore the system after unexpected crashes. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-27 16:16:17 +00:00
anamehra	a2bed2ae4a	container_checker on supervisor should check containers based on asic presence (#11442 ) Why I did it On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created. The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).	2022-08-26 20:50:24 +00:00
Saikrishna Arcot	91e9db005a	[202205]: Update package versions (#11801 ) This was done manually, to try to get past a build error due to changing package versions in Debian. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-08-21 15:23:44 -07:00
abdosi	0355caf20b	Added support to add gbsyncd in Feature Table of Host Config DB (#11754 ) Why I did: In case of multi-asic platforms gbsyncd is not getting added to Feature Table of Host Config DB. Without this container_checker complains of not needed gbsyncd container's are running. How I did: Update Both Host and Namespace config db when gbsyncd docker is starting. How I verify: Verified on Multi-asic platforms.	2022-08-19 15:22:12 +00:00
Nikola Dancejic	f63dc738f9	[swss] Adding conditional for bgp when on multi ASIC platform (#11691 ) bgp should be a per-asic service, and runs for each namespace on multi-asic platforms. However, putting bgp in MULTI_INST_DEPENDENT causes swss to be restarted as well as bgp. this is causing issues after #11000 Issue: #11653 This fix: removes bgp from dependents list adds a conditional that either adds bgp, or bgp@$DEV to separate between single and multi-asic platforms	2022-08-17 17:10:29 +00:00
Hua Liu	6a2c540cba	[swsscommon] Add c++ version sonic-db-cli from sonic-swss-common (#10825 ) (#11713 ) Cherry pick PR https://github.com/sonic-net/sonic-buildimage/pull/10825 to 202205 branch #### Why I did it Fix sonic-db-cli high CPU usage on SONiC startup issue: https://github.com/sonic-net/sonic-buildimage/issues/10218 ETA of this issue will be 2022/05/31 #### How I did it Re-write sonic-cli with c++ in sonic-swss-common: https://github.com/sonic-net/sonic-swss-common/pull/607 Modify swss-common rules and slave.mk to install c++ version sonic-db-cli. #### How to verify it Pass all E2E test scenario. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 #### Description for the changelog Build and install c++ version sonic-db-cli from swss-common. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration. --> #### A picture of a cute animal (not mandatory but encouraged)	2022-08-17 15:35:00 +08:00
mssonicbld	5c306cc2e5	[ci/build]: Upgrade SONiC package versions (#11679 )	2022-08-15 05:50:59 +00:00
Lawrence Lee	15c80b207c	[arp_update]: Resolve failed neighbors on dualtor (#11615 ) In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-08-11 16:19:25 +00:00
Stepan Blyshchak	3201dc93f6	[swss.sh/syncd.sh] Trap only on EXIT (#11590 ) When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing. I.e, the following script does not react on SIGTERM sent to it if it is waiting for sleep to finish: ``` trap "echo Handled SIGTERM" 0 2 3 15 echo "Before sleep" sleep inf echo "After sleep" ``` Instead, trap only on EXIT which covers also a scenario with exit on SIGINT, SIGTERM. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-11 16:18:00 +00:00
Ying Xie	094745f06f	[write_standby] update write_standby.py script (#11650 ) Why I did it The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay. However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started. So this script will have to write something at start up time. For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if link prober found link not in good state. How I did it Update the script to always provide initial values. How to verify it Tested on active-active dual-tor testbed. Signed-off-by: Ying Xie ying.xie@microsoft.com	2022-08-09 23:02:09 +00:00
Sudharsan Dhamal Gopalarathnam	871a1c51d8	[vs]Preventing ebtables cfg to be applied on vs (#11585 ) *Preventing ebtables rules to be applied on KVM image. The ebtables rules in SONiC are added to prevent ARP as well as L2 forwarding to be blocked in linux kernel since the hardware will take care of the actual L2 forward. However this is not the case with KVM where linux needs to forward even L2 packets	2022-08-08 20:45:28 +00:00
bingwang-ms	fda1290926	Support different `DSCP_TO_TC_MAP` for T1 in dualtor deployment (#11569 ) * Support different DSCP_TO_TC_MAP for T1 in dualtor deployment	2022-08-08 20:44:32 +00:00
Stepan Blyshchak	29d29b9491	[swss.sh] clear counters cache folder on swss cold/fast reload (#11244 ) A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location. - Why I did it To fix #9817. Clear the cache directory on swss.sh except for warm start. Also, adopted finalize-warmboot script to take the cache directory. - How I did it A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location. - How to verify it Run togather with Azure/sonic-utilities#2232. Verify counters cache is removed on config reload, cold/fast reboots, swss restart. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-08 20:42:54 +00:00
Nikola Dancejic	32fb4c7772	[swss] Adding bgp container as dependent of swss (#11000 ) What I did: Added bgp as a dependent of swss Why I did it: bgp container was not restarting on swss crash. When swss crashes, linkmgrd doesn't initate a switchover because it cannot access the default route from orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd to initiate a switchover to the peer ToR avoiding significant packet loss. How I did it: Added bgp to DEPENDENT Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>	2022-08-08 20:40:35 +00:00
mssonicbld	f30e85358e	[ci/build]: Upgrade SONiC package versions (#11438 ) Upgrade SONiC Versions	2022-08-07 11:29:11 +08:00
Jing Zhang	a71d5db05e	Update WARM START FINALIZER to wait for linkmgrd to reconcile (#11477 ) Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled. sign-off: Jing Zhang zhangjing@microsoft.com Why I did it To make sure finalizer save config after linkmgrd's reconciliation. How I did it Add linkmgrd to the reconciliation wait list of warmboot finalizer. How to verify it Verified on lab device, linkmgrd reconciled as expected.	2022-07-28 20:42:07 +00:00
Lior Avramov	ff3ad9ddd1	[memory_checker] Do not check memory usage of containers if docker daemon is not running (#11476 ) Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit). This PR fixes issue #11472. Signed-off-by: liora liora@nvidia.com Why I did it In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check. How I did it Use systemctl is-active to check if Docker engine is still running. How to verify it Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log. Which release branch to backport (provide reason below if selected) The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).	2022-07-28 20:37:22 +00:00
abdosi	eb56dc8b90	Enable ARP Update Script for Packet based chassis. (#11465 ) What I did: Following changes done for packet based chassis:- 1> Run arp_update on LC's to resolve static route nexthops over backend port-channel interfaces. 2> On Supervisor make sure arp_update exit gracefully	2022-07-28 20:36:54 +00:00
tjchadaga	0c7f0aa9b7	Add load_minigraph option to include traffic-shift-away during config migration (#11403 )	2022-07-28 20:34:39 +00:00
Stephen Sun	b4d8ee3fec	[Mellanox] Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario (#11261 ) - Why I did it Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario This is to port #11032 and #11299 from 202012 to master. Support additional queue and PG in buffer templates, including both traditional and dynamic model Support mapping DSCP 2/6 to lossless traffic in the QoS template. Add macros to generate additional lossless PG in the dynamic model Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2 Buffer tables are rendered via using macros. Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones. Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues. On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as: 40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues 16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-07-28 20:30:00 +00:00
tjchadaga	fc93871881	Changes to persist TSA/B state across reloads (#11257 )	2022-07-28 20:29:45 +00:00
andywongarista	f377636747	Add gbsyncd container for broncos (#11154 ) * Add docker-gbsyncd-broncos support * Address review comments * Add socket to gbsyncd * Upgrade gbsyncd-broncos to bullseye	2022-07-28 20:27:21 +00:00
bingwang-ms	f7cc66ad4c	Add flag to control the generation of `PORT_QOS_MAP\|global` entry (#11448 ) Why I did it This PR is to add a flag to control whether to generate PORT_QOS_MAP\|global entry or not. It's because for some HWSKU, such as BackEndToRRouter and BackEndLeafRouter, there is no DSCP_TO_TC_MAP defined. Hence, if the PORT_QOS_MAP\|global entry is generated, OA will report some error because the DSCP_TO_TC_MAP map AZURE can not be found. Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- saiObjectTypeQuery: invalid object id oid:0x7fddb43605d0 Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- meta_generic_validation_objlist: SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP:SAI_ATTR_VALUE_TYPE_OBJECT_ID object on list [0] oid 0x7fddb43605d0 is not valid, returned null object id Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- applyDscpToTcMapToSwitch: Failed to apply DSCP_TO_TC QoS map to switch rv:-5 Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- doTask: Failed to process QOS task, drop it This PR is to address the issue. How I did it Add a flag require_global_dscp_to_tc_map to control whether to generate the PORT_QOS_MAP\|global entry. The default value for require_global_dscp_to_tc_map is true. If the device type is storage backend, the value is changed to false. Then the PORT_QOS_MAP\|global entry is not generated. How to verify it Update the current test_qos_dscp_remapping_render_template to cover storage backend.	2022-07-17 03:20:20 +00:00
mssonicbld	63a3631d98	[ci/build]: Upgrade SONiC package versions (#11425 ) Upgrade SONiC Versions	2022-07-13 07:08:33 +08:00

1 2 3 4 5 ...

1145 Commits