sonic-buildimage

Author	SHA1	Message	Date
ganglv	2c7d53e5fb	Share docker image and use telemetry container for 202305 (#17255 ) Why I did it Need to share docker image for telemetry and gnmi, and only use telemetry container for 202305 branch Work item tracking Microsoft ADO (number only): How I did it Add a new docker image, base-gnmi, build sonic-gnmi and sonic-telemetry on this docker image. Enable telemetry container. How to verify it Run end to end test for telemetry and gnmi.	2023-11-24 11:22:48 +08:00
vdahiya12	066065f1cd	[pmon] update gRPC version to 1.57.0 (#16257 ) (#17219 ) * [pmon] update gRPC version to 1.57.0 (#16257) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com> * fix conflict Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com> --------- Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>	2023-11-23 21:03:07 +08:00
prabhataravind	aa8a5403b8	[image_config]: Update DHCP rate-limit (#17132 ) Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all scenarios This is an extension to the change in image_config: copp: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in [tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199 Why I did it 300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to provide better resiliency against DHCP traffic flood to CPU. Microsoft ADO 25776614: Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-11-23 12:33:56 +08:00
ganglv	733a902a70	Revert "[202305] Share image for gnmi and telemetry (#17137 )" (#17261 ) This reverts commit `f2a495f7e5`.	2023-11-22 23:51:34 +08:00
mssonicbld	1337d295a3	[chassisd]: Add alternate to the bridge interface created on chassis supervisor. (#16505 ) (#17223 )	2023-11-19 14:42:00 +08:00
ganglv	f2a495f7e5	[202305] Share image for gnmi and telemetry (#17137 ) Why I did it Share docker image to support gnmi container and telemetry container backport #16863 Work item tracking Microsoft ADO 25423918: How I did it Create telemetry image from gnmi docker image. Enable gnmi container and disable telemetry container by default. How to verify it Run end to end test.	2023-11-15 11:28:21 +08:00
mssonicbld	78cc6cfa22	[copp]: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld (#14859 ) (#17111 )	2023-11-07 20:52:08 +08:00
Vadym Hlushko	28ecd068d4	[202305][buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#17006 ) Why I did it Add the create_only_config_db_buffers attribute to the DEVICE_METADATA\|localhost. If the "create_only_config_db_buffers" exists and is equal to "true" - the buffers will be created according to the config_db configuration (for example BUFFER_QUEUE\|* table), otherwise the maximum available buffers (which are read from SAI) will be created, regardless of the CONFIG_DB buffers configuration. Work item tracking Microsoft ADO (number only): How I did it Add the create_only_config_db_buffers.json files for Mellanox devices (not MSFT SKU's), and inject the content to the CONFIG_DB during the swss docker container start. How to verify it Manual verification: Install the image with this PR included on the not MSFT SKU switch Check the show queue counters output and verify that only configured in CONFIG_DB buffers are created root@sonic:/home/admin# show queue counters Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes --------- ----- -------------- --------------- ----------- ------------ Ethernet0 UC0 0 0 0 N/A Ethernet0 UC1 0 0 0 N/A Ethernet0 UC2 0 0 0 N/A Ethernet0 UC3 0 0 0 N/A Ethernet0 UC4 0 0 0 N/A Ethernet0 UC5 0 0 0 N/A Ethernet0 UC6 0 0 0 N/A Open the /usr/share/sonic/device/$DEVICE/$SKU/create_only_config_db_buffers.json and change it to: "create_only_config_db_buffers": "false" Do config reload Check the show queue counters output and verify that all available buffers are created root@sonic:/home/admin# show queue counters Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes --------- ----- -------------- --------------- ----------- ------------ Ethernet0 UC0 0 0 0 N/A Ethernet0 UC1 0 0 0 N/A Ethernet0 UC2 0 0 0 N/A Ethernet0 UC3 0 0 0 N/A Ethernet0 UC4 0 0 0 N/A Ethernet0 UC5 0 0 0 N/A Ethernet0 UC6 0 0 0 N/A Ethernet0 UC7 60 15346 0 N/A Ethernet0 MC8 N/A N/A N/A N/A Ethernet0 MC9 N/A N/A N/A N/A Ethernet0 MC10 N/A N/A N/A N/A Ethernet0 MC11 N/A N/A N/A N/A Ethernet0 MC12 N/A N/A N/A N/A Ethernet0 MC13 N/A N/A N/A N/A Ethernet0 MC14 N/A N/A N/A N/A Ethernet0 MC15 N/A N/A N/A N/A	2023-11-03 14:27:17 +08:00
mssonicbld	fbf30ec6a8	[tacacs]: Fix tcpdump report error when tacacs enabled (#16372 ) (#17077 )	2023-11-03 04:31:18 +08:00
mssonicbld	feaa855346	Add special rsyslog filter for MSN2700 platform (#16684 ) (#17078 )	2023-11-03 03:05:44 +08:00
Samuel Angebault	274e929f11	Reduce SONiC image filesystem size (#16948 ) Why I did it Running SONiC releases past 202012 has become really challenging on system with small storage devices (4GB). Some of these devices can also be limited by only having 4GB of RAM which complicates mitigations. The main contributor to these issues is the SONiC image growth. Being able to reduce it by some decent amount should allow these systems to run SONiC longer. It would also reduce some impacts related to space savings mitigations. Work item tracking Microsoft ADO (number only): How I did it Add a build option to reduce the image size. The image reduction process is affecting the builds in 2 ways: change some packages that are installed in the rootfs apply a rootfs reduction script The script itself will perform a few steps: remove file duplication by leveraging hardlinks under /usr/share/sonic since the symlinks under the device folder are lost during the build. under /var/lib/docker since the files there will only be mounted ro remove some extra files (man, docs, licenses, ...) some image specific space reduction (only for aboot images currently) The script can later be improved but for now it's reducing the rootfs size by ~30%. How to verify it Compare the size of an image with this option enabled and this option enabled. Expect the fully extracted content to be ~30% less. Which release branch to backport (provide reason below if selected) This is a backport of #16729 Description for the changelog Add build option to reduce final image size	2023-10-24 21:08:38 +08:00
mssonicbld	bf605cf771	[ci/build]: Upgrade SONiC package versions (#16964 )	2023-10-21 23:04:00 +08:00
Longxiang Lyu	dd20597e4d	[snmp] Check intfmgrd running before start (#16588 ) Add pre start check to ensure intfmgrd is running. The check will run for 20 seconds at most. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-10-21 12:32:42 +08:00
Aman Singhal	f265c79541	[cisco]: Enable Kdump config by default for cisco-8000 (#16224 ) Why I did it Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf. After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot. How I did it Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms. How to verify it Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot. Kdump config should persist on subsequent reboots and kdump loaded during bootup Signed-off-by: Aman Singhal <amans@cisco.com>	2023-10-18 00:37:30 +08:00
Samuel Angebault	dbea038e96	Disable CPU C-States other than C1 (#16703 ) Why I did it Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state. There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state. To prevent this from happening the CPU should be forced to remain in C1 state. How I did it Generalize the cstate forcing to C1 to all Arista products. This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs. Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver. How to verify it Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs	2023-10-17 20:49:07 +08:00
mssonicbld	e80b956502	[ci/build]: Upgrade SONiC package versions (#15617 )	2023-10-17 20:48:25 +08:00
Saikrishna Arcot	39cdee57e1	[baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826 ) Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. Since we're building openssh with some patches, we need to update our version as well. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-10-17 16:34:18 +08:00
mssonicbld	185a63bc7f	[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733 ) (#16753 )	2023-09-29 05:29:20 +08:00
vganesan-nokia	52d5980c0c	[swss] Chassis db clean up optimization and bug fixes (#16454 ) (#16644 ) * [swss] Chassis db clean up optimization and bug fixes This commit includes the following changes: - Fix for regression failure due to error in finding CHASSIS_APP_DB in pizzabox (#PR 16451) - After attempting to delete the system neighbor entries from chassis db, before starting clearing the system interface entries, wait for sometime only if some system neighbors were deleted. If there are no system neighbors entries deleted for the asic coming up, no need to wait. - Similar changes for system lag delete. Before deleting the system lag, wait for some time only if some system lag memebers were deleted. If there are no system lag members deleted no need to wait. - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic is coming up, when system neigh entries are deleted from chassis ap db (as part of chassis db clean up), there is no orchs/process running to process the delete messages from chassis redis. Because of this, stale system neigh are entries present in the local STATE_DB. The stale entries result in creation of orphan (no corresponding data path/asic db entry) kernel neigh entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from the local STATE_DB when sevice comes up. Signed-off-by: vedganes <veda.ganesan@nokia.com> * [swss] Chassis db clean up bug fixes review comment fix - 1 Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE) Signed-off-by: vedganes <veda.ganesan@nokia.com> --------- Signed-off-by: vedganes <veda.ganesan@nokia.com> (cherry picked from commit `b13b41fc22`)	2023-09-22 10:58:27 +08:00
mssonicbld	e7f49c9bce	Fix potentially not having any loopback address on lo interface (#16490 ) (#16628 ) In #15080, there was a command added to re-add 127.0.0.1/8 to the lo interface when the networking configuration is being brought down. However, the trigger for that command is `down`, which, looking at ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is removed. This means there may be a period of time where there are no loopback addresses assigned to the lo interface, and redis commands will fail. Fix this by changing this to pre-down, which should run well before 127.0.0.1/16 is removed, and should always leave lo with a loopback address. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-09-21 20:40:21 +08:00
Alpesh Patel	6b48346ff5	qos template change for backend compute-ai deployment (#16150 ) #### Why I did it To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI". This deployment has the following requirement: - Config below enabled if DEVICE_TYPE as one of backend_device_types - Config below enabled if ResourceType is 'Compute-AI' - 2 lossless TCs' (2, 3) - 2 lossy TCs' (0,1) - DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows: "DSCP_TO_TC_MAP": { "AZURE": { "48" : "0", "46" : "1", "3" : "3", "4" : "4" } } - WRED profile has green {min/max/mark%} as {2M/10M/5%} This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here). ### How I did it #### How to verify it - with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met - verified no error ### Description for the changelog Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles	2023-09-21 18:34:11 +08:00
Prince George	d5a96f69f1	[platform]: Disable interrupt for intel i2c-i801 driver (#16309 ) On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance. We now disable the i801 driver interrupt and instead enable polling Microsoft ADO (number only): 24910530 How I did it Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver How to verify it This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:- - On SN2700 its already disabled in Mellanox hw-mgmt - Celestica DX010 and E1031 - Dell S6100 verified the interrupts are no longer incrementing. - Arista 7260CX3 Signed-off-by: Prince George <prgeor@microsoft.com>	2023-09-21 16:33:37 +08:00
StormLiangMS	2b381b1fd4	Revert "revert [syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) (#16179 )" (#16549 ) This reverts commit `164fa102c0`.	2023-09-14 20:52:14 +08:00
Kebo Liu	fe7eeed051	[202305][Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3(#16096 ) (#16298 ) * [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096) SONiC changes: 1. Support Spectrum4 ASIC FW binary building. 2. Support new SDK sx-obj-desc lib building since new SAI need it. 3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead). 4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3 SDK/FW bug fixes 1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes. 2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail. SDK/FW Features 1. On SN2700 all ports can support y cable by credo SAI bug Fixes 1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix 2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable 3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE SAI features 1. Port init profile - How I did it Update SDK/FW/SAI make files - How to verify it Run full sonic-mgmt regression on Mellanox platform Signed-off-by: Kebo Liu <kebol@nvidia.com> Conflicts: platform/mellanox/mlnx-sai.mk * Fix issue: unprintable character is rendered when handling comments in j2 Use "{#-" and "-#}" to mark comments in jinja template Signed-off-by: Stephen Sun <stephens@nvidia.com> --------- Signed-off-by: Stephen Sun <stephens@nvidia.com> Co-authored-by: Stephen Sun <stephens@nvidia.com>	2023-09-10 22:28:46 +08:00
mssonicbld	ebe24a134c	[chassis] Chassis DB cleanup when asic comes up (#16213 ) (#16417 )	2023-09-03 23:52:39 +08:00
mssonicbld	40a5cea84c	Assign the higher metric value for Ipv6 default route learnt via RA message (#16367 ) (#16429 )	2023-09-03 22:16:46 +08:00
mssonicbld	d62ae374a9	chassis-packet: Update arp_update script for FAILED and STALE check (#16311 ) (#16423 )	2023-09-03 21:24:17 +08:00
Junchao-Mellanox	cead17cb55	Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253 ) #### Why I did it A workaround to back port the fix for a systemd issue. The systemd issue: https://github.com/systemd/systemd/issues/24668 The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test. #### How I did it Copy the correct systemd-udevd.service file in build time #### Tested branch (Please provide the tested image version) - [x] 202211 - [ ] <!-- image version 2 --> ``` SONiC Software Version: SONiC.fix-udev.3-b65c7bdec_Internal SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-18-2-amd64 Build commit: `b65c7bdec` Build date: Mon Jun 19 10:54:50 UTC 2023 Built by: sw-r2d2-bot@r-build-sonic-ci02-241 Platform: x86_64-mlnx_msn4700-r0 HwSKU: ACS-MSN4700 ASIC: mellanox ASIC Count: 1 Serial Number: MT2022X08597 Model Number: MSN4700-WS2FO Hardware Revision: A1 Uptime: 08:10:11 up 1 min, 1 user, load average: 1.81, 0.67, 0.24 Date: Sun 25 Jun 2023 08:10:11 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-fpm-frr fix-udev.3-b65c7bdec_Internal a7b911e7cb6f 346MB docker-fpm-frr latest a7b911e7cb6f 346MB docker-platform-monitor fix-udev.3-b65c7bdec_Internal 94c5178cf80b 731MB docker-platform-monitor latest 94c5178cf80b 731MB docker-orchagent fix-udev.3-b65c7bdec_Internal 46b393e0ace8 328MB docker-orchagent latest 46b393e0ace8 328MB docker-syncd-mlnx fix-udev.3-b65c7bdec_Internal 1f5c6c23e33a 734MB docker-syncd-mlnx latest 1f5c6c23e33a 734MB docker-sflow fix-udev.3-b65c7bdec_Internal 7e45992c8c59 317MB docker-sflow latest 7e45992c8c59 317MB docker-teamd fix-udev.3-b65c7bdec_Internal e4d905592cda 316MB docker-teamd latest e4d905592cda 316MB docker-nat fix-udev.3-b65c7bdec_Internal 7fe799367580 319MB docker-nat latest 7fe799367580 319MB docker-macsec latest d702a5554171 318MB docker-snmp fix-udev.3-b65c7bdec_Internal 3bce8fcf71cd 338MB docker-snmp latest 3bce8fcf71cd 338MB docker-sonic-telemetry fix-udev.3-b65c7bdec_Internal f13949cbc817 597MB docker-sonic-telemetry latest f13949cbc817 597MB docker-dhcp-relay latest 153d9072805d 306MB docker-router-advertiser fix-udev.3-b65c7bdec_Internal aed642b9a6bc 299MB docker-router-advertiser latest aed642b9a6bc 299MB docker-sonic-p4rt fix-udev.3-b65c7bdec_Internal a3cae5ca65a7 870MB docker-sonic-p4rt latest a3cae5ca65a7 870MB docker-mux fix-udev.3-b65c7bdec_Internal b81f0401b9a8 347MB docker-mux latest b81f0401b9a8 347MB docker-eventd fix-udev.3-b65c7bdec_Internal c5917d0e801f 298MB docker-eventd latest c5917d0e801f 298MB docker-lldp fix-udev.3-b65c7bdec_Internal fd5dc14a7976 341MB docker-lldp latest fd5dc14a7976 341MB docker-database fix-udev.3-b65c7bdec_Internal 438c2715a1dd 299MB docker-database latest 438c2715a1dd 299MB docker-sonic-mgmt-framework fix-udev.3-b65c7bdec_Internal 5c50b115fbcd 414MB docker-sonic-mgmt-framework latest ```	2023-09-03 18:32:54 +08:00
Vadym Hlushko	b7dfc5b280	[memory_checker] Add a specific log message in a case when the docker service is not running. (#16018 ) #### Why I did it To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129). There could be a scenario before the reboot, where 1. The `docker service` has stopped 2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry` In such scenario, the `memory_checker` script will throw an error to the syslog: ``` ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))' ``` But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog. #### How I did it Change the log severity to the warning and changed the return value. #### How to verify it It is really hard to catch the exact moment described in the `Why I did it` section. In order to check the logic: 1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](`47742dfc2c/files/image_config/monit/memory_checker (L139)`) file on the switch. 2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry` 3. Check the syslog for such messages: ``` WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte d.', FileNotFoundError(2, 'No such file or directory'))' INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running! ```	2023-09-03 18:32:43 +08:00
xumia	288ebd5dd3	Support FIPS DB configuration (#15632 ) Why I did it Support FIPS DB configuration Design Doc: sonic-net/SONiC#1372 Work item tracking Microsoft ADO (number only): 24411148 How I did it Add the FIPS Yang model to make FIPS configurable in ConfigDB. How to verify it See TestPlan: sonic-net/sonic-mgmt#9092 Build the image and run the tests: sonic-net/sonic-mgmt#9091	2023-09-03 16:33:25 +08:00
StormLiangMS	7b8906600c	add sonic release for 202305 (#16364 )	2023-09-03 09:23:39 +08:00
andywongarista	f0823e6dd0	[Arista] Add support for DCS-7060DX5-32 (#14793 ) (#16176 ) * Add asic support for blackhawkth4dd * Add bfd feature to BlackhawkTh4Dd * Add platform data for blackhawkth4 * Add Qos settings for Blackhawk-TH4 * Add pg and queue settings for Blackhawk-TH4 * Add buffers_defaults_t0.j2 * Add blackhawkth4 to boot0 * Update 7060dx5 config.bcm * Fix build error --------- Co-authored-by: Boyang Yu <byu@arista.com> Co-authored-by: David Meggy <davidm@arista.com>	2023-09-03 09:21:33 +08:00
mssonicbld	adfc486456	Run db_migrator for non first-time reboots (#16116 ) (#16306 )	2023-08-29 05:36:36 +08:00
Vaibhav Hemant Dixit	0b83639068	Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685 ) (#16217 ) Cherypick of #15685 MSFT ADO: 24274591 Why I did it Two changes: 1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect. There are multiple places where same incorrect logic is used. Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation. root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 1 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 0 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# Fix this logic by checking for value of flag to be "1". root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done entered here entered here entered here This gap in logic was highlighted when another fix was merged: #14933 The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized. 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery. So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot. Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done. Work item tracking Microsoft ADO (number only): How I did it How to verify it	2023-08-24 16:58:24 +08:00
StormLiangMS	164fa102c0	revert [syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) (#16179 )	2023-08-19 16:01:29 +08:00
Vaibhav Hemant Dixit	2969d84e58	Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 )" (#15684 ) This reverts commit `9649a44470`.	2023-08-15 04:32:38 +08:00
Yevhen Fastiuk	4602d30a73	[syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) cherry-pick: #14513 depends: https://github.com/sonic-net/sonic-utilities/pull/2939 * Add an ability to configure remote syslog servers * Add an initial configuration for remote syslog * Extend YANG module and add unit tests #### Why I did it Adding the following functionality to rsyslog feature: * Configure remote syslog servers: protocol, filter, severity level * Update global syslog configuration: severity level, message format #### How I did it added parameters to syslog server and global configuration. #### How to verify it create syslog server using CLI/adding to Redis-DB verify server is added to file /etc/rsyslog.conf and server is functional. #### Description for the changelog extend rsyslog capabilities, added server and global configuration parameters. #### Link to config_db schema for YANG module changes [sonic-syslog.yang](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-syslog.yang)	2023-08-14 13:12:33 -07:00
mssonicbld	ec73d0f3ff	[chassis]: removed dependency for bgp and swss for chassis supervisor (#15734 ) (#16135 ) Fixes #15667 and #13293 Work item tracking Microsoft ADO 24472854: How I did it On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled. How to verify it Tests on chassis supervisor and LC Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>	2023-08-14 22:39:24 +08:00
Longxiang Lyu	6e49fa5fd2	[monit][dualtor] Periodically check mux neighbors consistency (#15769 ) Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-08-08 18:33:29 +08:00
mssonicbld	4ca01a7715	[syncd.sh] Clear semaphore before updating firmware (#15818 ) (#16067 )	2023-08-07 18:20:15 +08:00
vmittal-msft	5ee18ece65	Update WRED profile on system ports (#15612 ) * Update WRED profile on system ports	2023-08-07 14:33:42 +08:00
mssonicbld	33a10b479a	[nvidia] make sure shared storage with syncd is cleared on restarts (#14547 ) (#16046 ) Why I did it Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways. If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker How I did it Implemented new service to clean the shared storage. How to verify it Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>	2023-08-07 09:27:43 +08:00
Junchao-Mellanox	bf37c3162c	Fix issue: set delayed attribute to true for platform monitor service (#15816 ) There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "delayed=False". However, we know that PMON has a timer now. So, I try to fix it here.	2023-08-07 00:34:12 +08:00
mssonicbld	6004054711	[arp_update]: Fix IPv6 neighbor race condition (#15583 ) (#15877 )	2023-07-19 20:06:12 +08:00
lixiaoyuner	c59f55f6a3	Move k8s script to docker-config-engine (#14788 ) (#15768 ) Why I did it To reduce the container's dependency from host system Work item tracking Microsoft ADO (number only): 17713469 How I did it Move the k8s container startup script to config engine container, other than mount it from host. How to verify it Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container. Signed-off-by: Yun Li <yunli1@microsoft.com> Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>	2023-07-17 23:21:01 +08:00
mssonicbld	0b1f834e22	update rsyslog log size conf (#15821 ) (#15837 )	2023-07-14 20:34:22 +08:00
mssonicbld	bb3eff6ab4	Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 ) (#15618 )	2023-06-29 22:35:47 +08:00
Stepan Blyshchak	e2e5b77f16	[mlnx-ffb.sh] Update issu-version location (#14925 ) #### Why I did it ISSU version check fails due to inability to mount squashfs from 202211 on 201911 #### How I did it Put ISSU version file under platform directory #### How to verify it Warm-upgrade matrix: - 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master - 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211 - 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master - 202205 (with this change cherry-picked) to master	2023-06-15 15:14:52 -07:00
Saikrishna Arcot	f84dfd2345	Re-add 127.0.0.1/8 when bringing down the interfaces (#15080 ) * Re-add 127.0.0.1/8 when bringing down the interfaces With #5353, 127.0.0.1/16 was added to the lo interface, and then 127.0.0.1/8 was removed. However, when bringing down the lo interface, like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8 isn't added back to the interface. This means that there's a period of time where 127.0.0.1 is not available at all, and services that need to connect to 127.0.01 (such as for redis DB) will fail. To fix this, when going down, add 127.0.0.1/8. Add this address before the existing configuration gets removed, so that 127.0.0.1 is available at all times. Note that running `ifdown lo` doesn't actually bring down the loopback interface; the interface always stays "physically" up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-13 18:45:39 -07:00
Hua Liu	05f1a5a31e	Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429 ) Add watchdog mechanism to swss service and generate alert when swss have issue. Work item tracking Microsoft ADO (number only): 16578912 What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Manually test process_monitoring/test_critical_process_monitoring.py can pass. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-12 17:53:54 -07:00

1 2 3 4 5 ...

1227 Commits