sonic-buildimage

Author	SHA1	Message	Date
mssonicbld	de65640633	[ci/build]: Upgrade SONiC package versions (#15715 )	2023-07-05 18:37:13 +08:00
mssonicbld	7ef59d556b	[ci/build]: Upgrade SONiC package versions (#15706 )	2023-07-03 19:18:54 +08:00
mssonicbld	aa5164ef09	[ci/build]: Upgrade SONiC package versions (#15647 )	2023-07-01 18:39:31 +08:00
Lawrence Lee	b4a3711a95	[arp_update]: Fix IPv6 neighbor race condition (#15583 ) * [arp_update]: Fix IPv6 neighbor race condition on dualtors Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2023-06-30 14:06:25 -07:00
Stepan Blyshchak	1ebdcda9e3	[nvidia] make sure shared storage with syncd is cleared on restarts (#14547 ) Why I did it Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways. If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past. NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker How I did it Implemented new service to clean the shared storage. How to verify it Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-06-28 15:26:49 -07:00
siqbal1986	bf5b72a356	Vnet monitor table cleanup (#15399 ) * Added VNET_MONITOR_TABLE, BFD_SESSION_TABLE, to the listof tables to be cleaned up after swss restart. * Added VNET_ROUTE* table in cleanup. This should cover VNET_ROUTE_TUNNEL_TABLE as well.	2023-06-27 12:53:56 -07:00
mssonicbld	aa11acdddd	[ci/build]: Upgrade SONiC package versions	2023-06-26 20:55:55 +08:00
Junchao-Mellanox	b07957bdad	Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253 ) #### Why I did it A workaround to back port the fix for a systemd issue. The systemd issue: https://github.com/systemd/systemd/issues/24668 The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test. #### How I did it Copy the correct systemd-udevd.service file in build time #### Tested branch (Please provide the tested image version) - [x] 202211 - [ ] <!-- image version 2 --> ``` SONiC Software Version: SONiC.fix-udev.3-b65c7bdec_Internal SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-18-2-amd64 Build commit: `b65c7bdec` Build date: Mon Jun 19 10:54:50 UTC 2023 Built by: sw-r2d2-bot@r-build-sonic-ci02-241 Platform: x86_64-mlnx_msn4700-r0 HwSKU: ACS-MSN4700 ASIC: mellanox ASIC Count: 1 Serial Number: MT2022X08597 Model Number: MSN4700-WS2FO Hardware Revision: A1 Uptime: 08:10:11 up 1 min, 1 user, load average: 1.81, 0.67, 0.24 Date: Sun 25 Jun 2023 08:10:11 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-fpm-frr fix-udev.3-b65c7bdec_Internal a7b911e7cb6f 346MB docker-fpm-frr latest a7b911e7cb6f 346MB docker-platform-monitor fix-udev.3-b65c7bdec_Internal 94c5178cf80b 731MB docker-platform-monitor latest 94c5178cf80b 731MB docker-orchagent fix-udev.3-b65c7bdec_Internal 46b393e0ace8 328MB docker-orchagent latest 46b393e0ace8 328MB docker-syncd-mlnx fix-udev.3-b65c7bdec_Internal 1f5c6c23e33a 734MB docker-syncd-mlnx latest 1f5c6c23e33a 734MB docker-sflow fix-udev.3-b65c7bdec_Internal 7e45992c8c59 317MB docker-sflow latest 7e45992c8c59 317MB docker-teamd fix-udev.3-b65c7bdec_Internal e4d905592cda 316MB docker-teamd latest e4d905592cda 316MB docker-nat fix-udev.3-b65c7bdec_Internal 7fe799367580 319MB docker-nat latest 7fe799367580 319MB docker-macsec latest d702a5554171 318MB docker-snmp fix-udev.3-b65c7bdec_Internal 3bce8fcf71cd 338MB docker-snmp latest 3bce8fcf71cd 338MB docker-sonic-telemetry fix-udev.3-b65c7bdec_Internal f13949cbc817 597MB docker-sonic-telemetry latest f13949cbc817 597MB docker-dhcp-relay latest 153d9072805d 306MB docker-router-advertiser fix-udev.3-b65c7bdec_Internal aed642b9a6bc 299MB docker-router-advertiser latest aed642b9a6bc 299MB docker-sonic-p4rt fix-udev.3-b65c7bdec_Internal a3cae5ca65a7 870MB docker-sonic-p4rt latest a3cae5ca65a7 870MB docker-mux fix-udev.3-b65c7bdec_Internal b81f0401b9a8 347MB docker-mux latest b81f0401b9a8 347MB docker-eventd fix-udev.3-b65c7bdec_Internal c5917d0e801f 298MB docker-eventd latest c5917d0e801f 298MB docker-lldp fix-udev.3-b65c7bdec_Internal fd5dc14a7976 341MB docker-lldp latest fd5dc14a7976 341MB docker-database fix-udev.3-b65c7bdec_Internal 438c2715a1dd 299MB docker-database latest 438c2715a1dd 299MB docker-sonic-mgmt-framework fix-udev.3-b65c7bdec_Internal 5c50b115fbcd 414MB docker-sonic-mgmt-framework latest ```	2023-06-25 16:58:14 -07:00
Oleksandr Ivantsiv	475fe27c0b	[dns] Add support for static DNS configuration. (#14549 ) - Why I did it Add support for static DNS configuration. According to sonic-net/SONiC#1262 HLD. - How I did it Add a new resolv-config.service that is responsible for transferring configuration from Config DB into /etc/resolv.conf file that is consumed by various subsystems in Linux to resolve domain names into IP addresses. - How to verify it Run the image compilation. Each component related to the static DNS feature is covered with the unit tests. Run sonic-mgmt tests. Static DNS feature will be covered with the system tests. Install the image and run manual tests.	2023-06-22 19:12:30 +03:00
Vaibhav Hemant Dixit	9649a44470	Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 ) This reverts commit `02b17839c3`. Reverts #14933 The earlier commit caused a race condition that particularly broke cross branch warm upgrade. Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile. If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart. ADO: 24274591	2023-06-16 13:58:38 -07:00
Stepan Blyshchak	e2e5b77f16	[mlnx-ffb.sh] Update issu-version location (#14925 ) #### Why I did it ISSU version check fails due to inability to mount squashfs from 202211 on 201911 #### How I did it Put ISSU version file under platform directory #### How to verify it Warm-upgrade matrix: - 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master - 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211 - 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master - 202205 (with this change cherry-picked) to master	2023-06-15 15:14:52 -07:00
Saikrishna Arcot	f84dfd2345	Re-add 127.0.0.1/8 when bringing down the interfaces (#15080 ) * Re-add 127.0.0.1/8 when bringing down the interfaces With #5353, 127.0.0.1/16 was added to the lo interface, and then 127.0.0.1/8 was removed. However, when bringing down the lo interface, like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8 isn't added back to the interface. This means that there's a period of time where 127.0.0.1 is not available at all, and services that need to connect to 127.0.01 (such as for redis DB) will fail. To fix this, when going down, add 127.0.0.1/8. Add this address before the existing configuration gets removed, so that 127.0.0.1 is available at all times. Note that running `ifdown lo` doesn't actually bring down the loopback interface; the interface always stays "physically" up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-13 18:45:39 -07:00
Hua Liu	05f1a5a31e	Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429 ) Add watchdog mechanism to swss service and generate alert when swss have issue. Work item tracking Microsoft ADO (number only): 16578912 What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Manually test process_monitoring/test_critical_process_monitoring.py can pass. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-12 17:53:54 -07:00
Alpesh Patel	633fff8c10	enable ethernet backplane port support in port config for packet mode T2 devices (#14533 ) For T2 systems using packet mode, the backplane interfaces (Ethernet-BP#) and the fabric card ethernet interfaces are not visible as neighbor interfaces. In packet mode, these interfaces needs qos and buffer config as well. This fix addresses that issue and adds the backplane interfaces to the PORTS_ACTIVE list	2023-06-12 14:02:22 -07:00
mssonicbld	cb9d9e57a6	[ci/build]: Upgrade SONiC package versions (#15431 ) Upgrade SONiC Versions	2023-06-12 22:27:29 +08:00
mssonicbld	a45595158b	[ci/build]: Upgrade SONiC package versions (#15345 )	2023-06-10 20:38:13 +08:00
Liping Xu	78c41a1e58	allow docker_inram to kernel cmd list (#15374 ) Why I did it After docker_inram is enabled, the docker folder's default max size is 1.5G. It's not big enough for some tests which need to install additional docker images or install extra packages. Work item tracking Microsoft ADO 24199761: How I did it add docker_inram into cmdline_allowlist How to verify it sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append' sudo reboot and check the docker folder size	2023-06-10 14:19:44 +08:00
Sudharsan Dhamal Gopalarathnam	162856ad9a	[sflow]Delay starting sflow service until ports are created (#15333 ) * [sflow]Delay starting sflow service until ports are created * Removing sflow from sonic.target dependency since it will be managed by hostcfgd	2023-06-09 16:28:15 -07:00
Ye Jianquan	cec9d7b83a	Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 )" (#15390 ) This reverts commit `44427a2f6b`. Docker image not updated during PR validation and caused PR check failures. Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.	2023-06-09 09:10:35 +08:00
Yevhen Fastiuk	8a6d45227e	[Clock] Add timezone config YANG model (#14651 ) * Add the ability to configure timezone Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> * Add YANG model for timezone Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> * Add timezone reference Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> --------- Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>	2023-06-07 10:39:24 -07:00
Hua Liu	44427a2f6b	Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 ) This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first. What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-05 22:21:17 -07:00
siqbal1986	381cfe4485	Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE,VNET_ROUTE_TUNNEL_TABLE to the list (#14992 ) * The 3 tables in state DB need to be cleaned up after SWSS restart for have consistant state.	2023-06-05 13:18:50 -07:00
mssonicbld	4335690de7	[ci/build]: Upgrade SONiC package versions	2023-06-05 20:51:47 +08:00
Arvindsrinivasan Lakshmi Narasimhan	3f4b959d3f	[chassis] add libffi-dev for sonic-utilities (#15218 ) In the PR sonic-net/sonic-utilities#2850 , for support remote access of linecards paramiko package is installed in sonic-utilities. libffi-dev needs to installed to be able to compile for armhf image Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2023-06-03 14:36:50 -07:00
mssonicbld	f80e182c22	[ci/build]: Upgrade SONiC package versions (#15325 )	2023-06-03 19:45:07 +08:00
mssonicbld	c044e6e34e	[ci/build]: Upgrade SONiC package versions (#15307 )	2023-06-02 21:40:29 +08:00
Vaibhav Hemant Dixit	02b17839c3	Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 ) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph	2023-05-30 10:16:21 -07:00
vmittal-msft	ecb4db58a9	Update PG headroom settings ports based on port speed/cable length (#14908 ) * Update PG headroom settings ports based on port speed/cable length * Updated XOFF settings to use chip level numbers than core * Updated PG headroom based on uplink/downlink side * fix for sonic-config-gen tests * More fixes for unit test cases * more test fixes * Merged multiple functions into one	2023-05-19 08:19:27 -07:00
Pavan-Nokia	c5d0507224	[arm64][Nokia-7215-A1]Add support for Nokia-7215-A1 platform (#13795 ) Add new Nokia build target and establish an arm64 build: Platform: arm64-nokia_ixs7215_52xb-r0 HwSKU: Nokia-7215-A1 ASIC: marvell Port Config: 48x1G + 4x10G How I did it - Change make files for saiserver and syncd to use Bulleseye kernel - Change Marvell SAI version to 1.11.0-1 - Add Prestera make files to build kernel, Flattened Device Tree blob and ramdisk for arm64 platforms - Provide device and platform related files for new platform support (arm64-nokia_ixs7215_52xb-r0).	2023-05-18 14:24:05 -07:00
Samuel Angebault	fa95ebcaae	Add optional zram compression for docker_inram Some devices running SONiC have a small storage device (2G and 4G mainly) The SONiC image growth over time has made it impossible to install 2 images on a single device. Some mitigations have been implemented in the past for some devices but there is a need to do more. One such mitigation is `docker_inram` which creates a `tmpfs` and extracts `dockerfs.tar.gz` in it. This all happens in the SONiC initramfs and by ensuring the installation process does not extract `dockerfs.tar.gz` on the flash but keep the file as is. This mitigation does a tradeoff by using more RAM to reduce the disk footprint. It however creates new issues for devices with 4G of system memory since the extracted `dockerfs.tar.gz` nears the 1.6G. Considering debian upgrades (with dual base images) and the continuous stream of features this is only going to get bigger. This change introduces an alternative to the `tmpfs` by allowing a system to extract the `dockerfs.tar.gz` inside a `zram` device thus bringing compression in play at the detriment of performance. Introduce 2 new optional kernel parameters to be consumed by SONiC initramfs. - `docker_inram_size` which represent the max physical size of the `zram` or `tmpfs` volume (defaults to DOCKER_RAMFS_SIZE) - `docker_inram_algo` which is the method to use to extract the `dockerfs.tar.gz` (defaults to `tmpfs`) other values are considered to be compression algorithm for `zram` (e.g `zstd`, `zlo-rle`, `lz4`) Refactored the logic to mount the docker fs in the SONiC initramfs under the `union-mount` script. Moved the code into a function to make it cleaner and separated the inram volume creation and docker extraction. On Arista platform with a flash smaller or equal to 4GB set `docker_inram_algo` to `zstd` which produces the best compression ratio at the detriment of a slower write performance and a similar read performance to other `zram` compression algorithms.	2023-05-18 14:21:52 -07:00
Samuel Angebault	467994c024	[Arista] Fix boot0 code for docker_inram Enable docker_inram for all systems with 4GB or less of flash. This is mandatory to allow these systems to store 2 SONiC images. This change also fixes the missing docker_inram attribute when installing a new image from SONiC. Because the SWI image can ship with additional kernel parameters within such as `sonic_fips=` this lead to a conflict. To prevent the conflict, the extra kernel parameters from the SWI are now stored in the file `kernel-cmdline-append` which isn't used anywhere.	2023-05-18 14:21:52 -07:00
Anish Narsian	05a85b57b8	[arp_update] Resolve neighbors from config_db (#15006 ) * To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.	2023-05-17 10:42:03 -07:00
mssonicbld	3d1ae46f90	[ci/build]: Upgrade SONiC package versions	2023-05-15 18:32:43 +08:00
mssonicbld	31223fb9fe	[ci/build]: Upgrade SONiC package versions (#15057 )	2023-05-13 18:30:20 +08:00
judyjoseph	efeae03ea3	Add override_config to load_minigraph in config-setup service (#14834 ) This PR is to handle the override minigraph config by golden_config_db.json file if it is present in the backup location.	2023-05-10 11:54:33 -07:00
Zain Budhwani	a738c39328	Add fix to monit_regex.json for catching mem_usage and cpu_usage (#14954 ) Why I did it Current regex not able to capture logs, modify regex to capture syslog messages Work item tracking Microsoft ADO (number only): 13366345 How I did it Code change How to verify it sonic-mgmt test case	2023-05-08 11:48:17 -07:00
Ying Xie	72c52bc677	Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516 )" (#14902 ) This reverts commit `c7ecd92c54`.	2023-05-01 17:12:38 -07:00
mssonicbld	80c5ab4a4a	[ci/build]: Upgrade SONiC package versions (#14896 )	2023-05-01 18:10:48 +08:00
mssonicbld	0d709a3655	[ci/build]: Upgrade SONiC package versions (#14888 )	2023-04-29 17:42:19 +08:00
Tejaswini Chadaga	ca224863cb	Changes to support TSA from supervisor (#14691 ) Why I did it Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module Work item tracking Microsoft ADO (number only): 17826134 How I did it When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this) Changes also include no-stats option with TSC for quick retrieval of the current system isolation state This PR also reverts changes in #11403 How to verify it These changes have a dependency on sonic-net/sonic-utilities#2701 for testing Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard Verify that all routes are withdrawn from eBGP neighbors on all linecards Run TSB from supervisor module and ensure transition to Normal mode on each linecard Verify that all routes are re-advertised from eBGP neighbors on all linecards Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards	2023-04-28 16:28:06 +08:00
Stephen Sun	9e56fea091	Temporary WA for the issue that asic_table.json can not be rendered (#13888 ) - Why I did it We suspect the issue #13791 is caused by redis server being temporarily unavailable during system initialization so we do not use -d in sonic-cfggen, for now, to avoid accessing redis server - How I did it Provide a string containing required json data when calling sonic-cfggen - How to verify it Manually test it Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-04-24 17:02:35 +03:00
mssonicbld	5ad844f185	[ci/build]: Upgrade SONiC package versions	2023-04-24 18:33:06 +08:00
mssonicbld	81a557885b	[ci/build]: Upgrade SONiC package versions (#14799 )	2023-04-22 17:47:40 +08:00
mssonicbld	d006219e2d	[ci/build]: Upgrade SONiC package versions (#14718 )	2023-04-19 18:59:16 +08:00
Aryeh Feigin	039a9c998a	[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583 ) Part of sonic-net/sonic-utilities#2760 Similar to #14295 - Why I did it To clear teamd timer when fast-reboot is finalized to prevent any further affect. - How I did it Deleted teamd timer from config-db in fast-reboot finalizer. config save call is moved to after clearing teamd-timer so it won't have any further affect as well. - How to verify it Verified manually that entry was deleted after fast-reboot was finailized.	2023-04-18 09:15:42 +03:00
Stepan Blyshchak	d73c810e86	[image_config] add rasdaemon.timer (#14300 ) rasdaemon is a tool to log hardware errors. It takes 100% CPU during boot for a few seconds. It impacts fast/warm boot by delaying control plane restoration for 5 sec on some platforms. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-04-17 08:58:45 -07:00
mssonicbld	7f262d71da	[ci/build]: Upgrade SONiC package versions (#14685 )	2023-04-17 19:58:43 +08:00
mssonicbld	49dbaeb649	[ci/build]: Upgrade SONiC package versions (#14672 )	2023-04-15 18:21:50 +08:00
Sudharsan Dhamal Gopalarathnam	2804998766	[config reload]Config Reload Enhancement (#13969 ) #### Why I did it Implementing code changes for https://github.com/sonic-net/SONiC/pull/1203 #### How I did it Removed the timers and delayed target since the delayed services would start based on event driven approach. Cleared port table during config reload and cold reboot scenario. Modified yang model, init_cfg.json to change has_timer to delayed #### How to verify it Running regression	2023-04-12 11:20:03 -07:00
mssonicbld	f9eb849d75	[ci/build]: Upgrade SONiC package versions (#14620 )	2023-04-12 20:05:30 +08:00
anamehra	f34360f101	chassis-packet: resolve the missing static routes (#14593 ) Why I did it Fixes #14179 chassis-packet: missing arp entries for static routes causing high orchagent cpu usage It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear. How I did it arp_update should resolve the missing arp/ndp static route entries. Added code to check for missing entries and try ping if any found to resolve it. How to verify it After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present manual validation: Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries run arp_update Check for neigh entries. All entries should be present. Testing on T0 setup route/for test_static_route.py The test set the STATIC_ROUTE entry in conifg db without ifname: sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE\|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23 "STATIC_ROUTE": { "2.2.2.0/24": { "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23" } }, Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash: { "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" } validate route/test_static_route.py testcase pass.	2023-04-12 15:07:42 +08:00
xumia	f1fd42558a	Support to add SONiC OS Version in device info (#14601 ) Why I did it Support to add SONiC OS Version in device info. It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version. SONiC Software Version: SONiC.master-13812.218661-7d94c0c28 SONiC OS Version: 11 Distribution: Debian 11.6 Kernel: 5.10.0-18-2-amd64 How I did it	2023-04-12 09:20:08 +08:00
mssonicbld	4e5c8988b1	[ci/build]: Upgrade SONiC package versions (#14586 )	2023-04-10 18:10:37 +08:00
Aryeh Feigin	41a9813018	Finalize fast-reboot in warmboot finalizer (#14238 ) - Why I did it To solve an issue with upgrade with fast-reboot including FW upgrade which has been introduced since moving to fast-reboot over warm-reboot infrastructure. As well, this introduces fast-reboot finalizing logic to determine fast-reboot is done. - How I did it Added logic to finalize-warmboot script to handle fast-reboot as well, this makes sense as using fast-reboot over warm-reboot this script will be invoked. The script will clear fast-reboot entry from state-db instead of previous implementation that relied on timer. The timer could expire in some scenarios between fast-reboot finished causing fallback to cold-reboot and possible crashes. As well this PR updates all services/scripts reading fast-reboot state-db entry to look for the updated value representing fast-reboot is active. - How to verify it Run fast-reboot and check that fast-reboot entry exists in state-db right after startup and being cleared as warm-reboot is finalized and not due to a timer.	2023-04-09 16:59:15 +03:00
mssonicbld	e32624d362	[ci/build]: Upgrade SONiC package versions (#14571 )	2023-04-08 18:00:30 +08:00
Stephen Sun	152148fb81	Enhance the error message output mechanism (#14384 ) #### Why I did it Enhance the error message output mechanism during swss docker creating #### How I did it Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog. #### How to verify it Manually test	2023-04-07 14:23:35 -07:00
Devesh Pathak	d74055e12c	Increase wait_for_tunnel() timeout to 90s (#14279 ) Why I did it Orchagent sometimes take additional time to execute Tunnel tasks. This cause write_standby script to error out and mux state machines are not initialized. It results in show mux status missing some ports in output. Mar 13 20:36:52.337051 m64-tor-0-yy41 INFO systemd[1]: Starting MUX Cable Container... Mar 13 20:37:52.480322 m64-tor-0-yy41 ERR write_standby: Timed out waiting for tunnel MuxTunnel0, mux state will not be written Mar 13 20:37:58.983412 m64-tor-0-yy41 NOTICE swss#orchagent: :- doTask: Tunnel(s) added to ASIC_DB. How I did it Increase timeout from 60s to 90s How to verify it Verified that mux state machine is initialized and show mux status has all needed ports in it.	2023-04-07 11:30:58 +08:00
Ying Xie	737d0e57ad	[write standby] force DB connections to use unix socket to connect (#14524 ) Why I did it At service start up time, there are chances that the networking service is being restarted by interface-config service. When that happens, write_standby could fail to make DB connections due to loopback interface is being reconfigured. How I did it Force the db connector to use unix socket to avoid loopback reconfig timing window. How to verify it Run config reload test 20+ times and no issue encountered. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * use unix socket instead Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2023-04-06 13:54:56 -07:00
Ye Jianquan	6c04ed987d	Revert "chassis-packet: resolve the missing static routes (#14230 )" (#14544 ) This reverts commit `a8f8ea3b50`.	2023-04-06 10:36:10 -07:00
mssonicbld	41c46aedf6	[ci/build]: Upgrade SONiC package versions (#14528 )	2023-04-05 18:36:57 +08:00
Ying Xie	d3f3ac6411	Delay mux/sflow/snmp timer after interface-config service (#14506 ) Why I did it All these 3 services started after swss service, which used to start after interface-config service. But #13084 remove the time constraints for swss. After that, these 3 services has the chance of start earlier when the inteface-config service is restarting the networking service, which could cause db connect request to fail. How I did it Delay mux/sflow/snmp timer after the interface-config service. How to verify it PR test. Config reload can repro the issue in 1-3 retries. With this change. config reload run 30+ iterations without hitting the issue. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2023-04-04 16:23:00 -07:00
mssonicbld	884dfa5427	[ci/build]: Upgrade SONiC package versions (#14498 )	2023-04-03 18:34:35 +08:00
mssonicbld	66d3586fd4	[ci/build]: Upgrade SONiC package versions (#14487 )	2023-04-01 18:45:34 +08:00
anamehra	a8f8ea3b50	chassis-packet: resolve the missing static routes (#14230 ) arp_update should resolve the missing arp/ndp static route entries. Added code to check for missing entries and try ping to resolve the missing entry. Why I did it Fixes #14179 chassis-packet: missing arp entries for static routes causing high orchagent cpu usage It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear. How I did it arp_update should resolve the missing arp/ndp static route entries. Added code to check for missing entries and try ping if any found to resolve it. How to verify it After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present manual validation: Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries run arp_update Check for neigh entries. All entries should be present. Signed-off-by: anamehra <anamehra@cisco.com>	2023-03-29 09:53:32 -07:00
mssonicbld	6e11833a6c	[ci/build]: Upgrade SONiC package versions (#14430 )	2023-03-29 18:39:10 +08:00
Hua Liu	4c059d8eb5	Improve sudo cat command for RO user. (#14428 ) Improve sudo cat command for RO user. #### Why I did it RO user can use sudo command show none syslog files. #### How I did it Improve sudo cat command for RO user. #### How to verify it Pass all UT. Manually check fixed code work correctly. #### Description for the changelog Improve sudo cat command for RO user.	2023-03-27 17:08:14 -07:00
oleksandrx-kolomeiets	4da51b07ad	Set owner after restoring counters folder during warmboot (#13507 ) Why I did it After warm reboot, show environment prints the following error: failed to import plugin show.plugins.macsec: [Errno 13] Permission denied: '/tmp/cache/macsec' How I did it Set owner back to admin after restoring counters folder. How to verify it sudo warm-reboot, then ensure show environement does not print errors. Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>	2023-03-27 10:32:07 -07:00
mssonicbld	fb6e37819b	[ci/build]: Upgrade SONiC package versions (#14414 )	2023-03-25 19:21:56 +08:00
mssonicbld	20f1ab8203	[ci/build]: Upgrade SONiC package versions (#14383 )	2023-03-22 19:34:21 +08:00
mssonicbld	4429bdd091	[ci/build]: Upgrade SONiC package versions (#14354 )	2023-03-21 01:10:17 +08:00
xumia	7209666374	[Security] Fix some of vulnerability issue relative python packages (#14269 ) Why I did it Fix some of vulnerability issue relative python packages #14269 Pillow: [CVE-2021-27921] Wheel: [CVE-2022-40898] lxml: [CVE-2022-2309] How I did it	2023-03-20 14:15:45 +08:00
mssonicbld	1e8e993a94	[ci/build]: Upgrade SONiC package versions	2023-03-20 09:00:28 +08:00
mssonicbld	89ebd43c81	[ci/build]: Upgrade SONiC package versions (#14311 ) Upgrade SONiC Versions	2023-03-19 10:16:41 +08:00
Dev Ojha	de17f72d9a	[Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14280 ) Why I did it SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1. How I did it Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router. How to verify it Unit tests pass, and manually checked on a 7260 to see the changes take effect. Signed-off-by: dojha <devojha@microsoft.com>	2023-03-17 11:01:17 -07:00
mssonicbld	96817c4357	[ci/build]: Upgrade SONiC package versions (#14102 ) Upgrade SONiC Versions	2023-03-17 10:12:30 +08:00
Neetha John	f30fb6ec58	[storage_backend] Add backend acl service (#14229 ) Why I did it This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device How I did it The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload How to verify it Build an image with the following changes and did the following tests Verified that acl is loaded successfully on a storage backend device after a switch boot up Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table Signed-off-by: Neetha John <nejo@microsoft.com>	2023-03-16 14:18:28 -07:00
davidpil2002	8098bc4bf5	Add Secure Boot Support (#12692 ) - Why I did it Add Secure Boot support to SONiC OS. Secure Boot (SB) is a verification mechanism for ensuring that code launched by a computer's UEFI firmware is trusted. It is designed to protect a system against malicious code being loaded and executed early in the boot process before the operating system has been loaded. - How I did it Added a signing process to sign the following components: shim, grub, Linux kernel, and kernel modules when doing the build, and when feature is enabled in build time according to the HLD explanations (the feature is disabled by default). - How to verify it There are self-verifications of each boot component when building the image, in addition, there is an existing end-to-end test in sonic-mgmt repo that checks that the boot succeeds when loading a secure system (details below). How to build a sonic image with secure boot feature: (more description in HLD) Required to use the following build flags from rules/config: SECURE_UPGRADE_MODE="dev" SECURE_UPGRADE_DEV_SIGNING_KEY="/path/to/private/key.pem" SECURE_UPGRADE_DEV_SIGNING_CERT="/path/to/cert/key.pem" After setting those flags should build the sonic-buildimage. Before installing the image, should prepared the setup (switch device) with the follow: check that the device support UEFI stored pub keys in UEFI DB enabled Secure Boot flag in UEFI How to run a test that verify the Secure Boot flow: The existing test "test_upgrade_path" under "sonic-mgmt/tests/upgrade_path/test_upgrade_path", is enough to validate proper boot You need to specify the following arguments: Base_image_list your_secure_image Taget_image_list your_second_secure_image Upgrade_type cold And run the test, basically the test will install the base image given in the parameter and then upgrade to target image by doing cold reboot and validates all the services are up and working correctly	2023-03-14 14:55:22 +02:00
Stepan Blyshchak	f908dfe919	[Mellanox] Place FW binaries under platform directory instead of squashfs (#13837 ) Fixes #13568 Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation: admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa - Why I did it 202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change. - How I did it Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation. /etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade. - How to verify it Upgrade from 201911 to master master to 201911 downgrade master -> master reboot ONIE -> master boot (First FW burn) Which release branch to backport (provide reason below if selected)	2023-03-06 13:36:43 +02:00
mssonicbld	506f372533	[ci/build]: Upgrade SONiC package versions (#14072 ) Upgrade SONiC Versions	2023-03-05 11:29:38 +08:00
anamehra	4a93e4cfa4	Add support for platform syncd pre shutdown plugin (#13564 ) Why I did it Vendor platform may require running platform specific pre-shutdown routine before shutting down the syncd process which runs the SAI and vendor sdk instance. How I did it Added a platform script hook which will be executed if the plugin script is provided by the platform in device//plugins/	2023-03-03 15:53:33 -08:00
Sudharsan Dhamal Gopalarathnam	8883259673	[netlink] Increse netlink buffer size from 3MB to 16MB (#13965 ) #### Why I did it Following the PR https://github.com/sonic-net/sonic-swss-common/pull/739 increasing netlink buffer size in linux kernel As error is seen in fdbsyncd with netlink reports "out of memory on reading a netlink socket" It is seen when kernel is sending 10k remote mac to fdbsyncd. #### How I did it Increase the buffer size of the netlink buffer from 3MB to 16MB #### How to verify it Verified with 10k remote mac, and restarting the fdbsyncd process. So that kernel send the bridge fdb dump to the fdbsyncd. Verified that the netlink buffer error is not reported in the sys log.	2023-02-27 15:41:22 -08:00
mssonicbld	8d0d3e57ba	[ci/build]: Upgrade SONiC package versions (#13989 ) Upgrade SONiC Versions	2023-02-27 13:45:49 +08:00
mssonicbld	58592e6c49	[ci/build]: Upgrade SONiC package versions (#13526 ) The initial version files for the SONiC reproducible build	2023-02-25 08:16:38 +08:00
Samuel Angebault	b9dffcbaaf	[Arista] Disable SSD NCQ on Lodoga (#13964 ) Why I did it Fix similar issue seen on #13739 but only for DCS-7050CX3-32S How I did it Add a kernel parameter to tell libata to disable NCQ How to verify it The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg. Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 with NCQ READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec without NCQ READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec	2023-02-24 10:08:04 -08:00
DavidZagury	ee1b6b3751	Remove support to Mellanox SPC4 ASIC (#13932 ) - Why I did it FW for Spectrum-4 ASIC not yet available - How I did it Remove in Mellanox fw make files to Spectrum-4 ASIC firmware binaries. Remove from firmware upgrade scripts to be able Spectrum-4 ASIC. - How to verify it Run regression test	2023-02-23 08:25:34 +02:00
Andriy Yurkiv	5ad78abea0	[Dual-ToR] add default value for ACL rule for mellanox platform (#13547 ) - Why I did it Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario - How I did it Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table - How to verify it check that new attribute exists in redis: admin@sonic:~$ redis-cli -n 4 127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS\|mux_tunnel_ingress_acl 1."state" 2."false" Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>	2023-02-22 20:25:54 +02:00
Marty Y. Lok	2c22d9affc	[Chassis][multiasic] Fix the sonic-db-cli core files issue on multiasic platform after the c++ implementation of sonic-db-cli (#13207 ) Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. Signed-off-by: mlok <marty.lok@nokia.com>	2023-02-21 11:23:22 -08:00
Saikrishna Arcot	56d732a0a0	Use tmpfs for /var/log on Arista 7050CX3-32S (#13805 ) This is to reduce writes to the SSD on the device. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-02-16 19:13:39 -08:00
Samuel Angebault	5ce1b8e4b7	[Arista] Disable ATA NCQ for a few products (#13739 ) Why I did it Some products might experience an occasional IO failure in the communication between CPU and SSD. Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue). This issue currently affect 4 products: DCS-7170-32C* DCS-7170-64C DCS-7060DX4-32 DCS-7260CX3-64 How I did it This change disable NCQ on the affected drive for a small set of products. How to verify it When the fix is applied, these 2 patterns can be found in the dmesg. ata1.00: FORCE: horkage modified (noncq) NCQ (not used) Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA) READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used)) READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec Which release branch to backport (provide reason below if selected)	2023-02-15 10:31:59 -08:00
Stepan Blyshchak	e5a294644c	[dockerd] Force usage of cgo DNS resolver (#13649 ) Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux. It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot. Consider the following script: ``` admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done Fri 03 Feb 2023 10:06:22 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused Fri 03 Feb 2023 10:06:23 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused Fri 03 Feb 2023 10:06:24 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused Fri 03 Feb 2023 10:06:25 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused Fri 03 Feb 2023 10:06:26 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused Fri 03 Feb 2023 10:06:27 AM UTC nameserver 10.211.0.124 nameserver 10.211.0.121 nameserver 10.7.77.135 search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data. 64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms --- harbor.mellanox.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms 1.0.0: Pulling from sonic/cpu-report 004f1eed87df: Downloading [===================> ] 19.3MB/50.43MB 5d6f1e8117db: Download complete 48c2faf66abe: Download complete 234b70d0479d: Downloading [=========> ] 9.363MB/51.84MB 6fa07a00e2f0: Downloading [==> ] 9.51MB/192.4MB 04a31b4508b8: Waiting e11ae5168189: Waiting 8861a99744cb: Waiting d59580d95305: Waiting 12b1523494c1: Waiting d1a4b09e9dbc: Waiting 99f41c3f014f: Waiting ``` While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged. As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385. There have been issues like that reported in docker like: - https://github.com/docker/cli/issues/2299 - https://github.com/docker/cli/issues/2618 - https://github.com/moby/moby/issues/22398 Since this starts to happen after inclusion of resolvconf package by above mentioned PR and the fact I can't see any problem with that (ping, nslookup, etc. works) the choice is made to force dockerd to use cgo (libc) resolver. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-02-14 08:57:19 +02:00
zhixzhu	f0f7639fa2	set cable length to 1m for backplane ports (#13572 ) Signed-off-by: Zhixin Zhu zhixzhu@cisco.com Why I did it backplane ports cable length need to be specified. How I did it separated handling for the specific port name.	2023-02-10 19:01:49 -08:00
andywongarista	1894e0aafe	Increase PikeZ varlog size (#13550 ) Why I did it To address error sometimes seen when running sonic-mgmt test_stress_routes.py::test_announce_withdraw_route on 720DT-48S How I did it Update boot0 logic to set platform specific varlog size for 720DT-48S How to verify it Verified that /var/log size increased and error is no longer observed when running test	2023-02-09 13:24:09 -08:00
Samuel Angebault	dd7948bf17	[Arista] Add emmc quirks in boot0 to improve reliability (#10013 ) Why I did it Fix some unreliability seen on emmc device with some AMD CPUs How I did it Added a kernel parameter to add quirks to It depends on a sonic-linux-kernel change to work properly but will be a no-op without it. The quirk added is SDHCI_QUIRK2_BROKEN_HS200 used to downgrade the link speed for the eMMC.	2023-02-09 10:46:09 -08:00
Stephen Sun	e3ff08833e	[Mellanox] Support DSCP remapping in dual ToR topo on T0 switch (#12605 ) - Why I did it Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8. - How I did it Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back. To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches. On uplink ports, map DSCP 2/6 to TC 2/6 respectively On downlink ports, both DSCP 2/6 are still mapped to TC 1 Buffer adjusted according to the ports information: Mellanox-SN4600c-C64: 56 downlinks 50G + 8 uplinks 100G Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8: 24 downlinks 50G + 8 uplinks 100G - How to verify it Unit test. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-02-07 16:21:59 +02:00
Chun'ang Li	eea54717b8	Fix rsyslogd start failed cause by rsyslog.conf is emtpy. (#13669 ) - Why I did it In to-sonic and multi-asic KVM-test, pretest sometimes failed. Reason is rsyslogd process can not start in teamd container. Because rsyslog.conf is empty caused by sonic-cfggen execute failed - How I did it If sonic-cfggen -d execute failed, execute without -d because the template file has the default value. - How to verify it Build image and test it over 40 times, all passed pretest. Signed-off-by: Chun'ang Li <chunangli@microsoft.com>	2023-02-06 16:38:04 +02:00
Sudharsan Dhamal Gopalarathnam	1ff0c0b685	[Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533 ) - Why I did it Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker - How I did it Added script in docker-syncd of mellanox and copied it to /usr/bin - How to verify it Manual UT and new sonic-mgmt tests	2023-02-05 16:45:49 +02:00
Saikrishna Arcot	ee1c32a802	Use tmpfs for /var/log for Arista 7260 (#13587 ) This is to reduce writes to disk, which then can use the SSD to get worn out faster. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-02-02 09:07:33 -08:00
anamehra	26af468a99	Add support for platform topology configuration service (#12066 ) * Add support for platform topology configuration service This service invokes the platform plugin for platform specific topology configuration. The path for platform plugin script is: /usr/share/sonic/device/$PLATFORM/plugins/config-topology.sh If the platform plugin is not available, this service does nothing. Signed-off-by: anamehra <anamehra@cisco.com>	2023-02-01 12:53:45 -08:00
Richard.Yu	a096363b48	[broadcom]: Set default SYNCD_SHM_SIZE for Broadcom XGS devices (#13297 ) After upgrade to brcmsai 8.1, the sdk running environment (container) recommended with mininum memory size as below TH4/TD4(ltsw) uses 512MB TH3 used 300MB Helix4/TD2/TD3/TH/TH 256 MB Base on this requirement, adjust the default syncd share memory size and set the memory size for special ACISs in platform_env.conf file for different types of Broadcom ASICs. How I did it Add the platform_env.conf file if none of it for broadcom platform (base on platform_asic file) Add the 'SYNCD_SHM_SIZE' and set the value for ltsw(TD4/TH4) devices set to 512M at least (update the platform_env.conf) for Td2/TH2/TH devices set to 256M for TH3 set to 300M verify How to verify it verify the image with code fix Check with UT Check on lab devices On a problematic device which cannot start successfully Run with the command $ cat /proc/linux-kernel-bde Broadcom Device Enumerator (linux-kernel-bde) Module parameters: maxpayload=128 usemsi=0 dmasize=32M himem=(null) himemaddr=(null) DMA Memory (kernel): 33554432 bytes, 0 used, 33554432 free, local mmap No devices found $ docker rm -f syncd syncd $ sudo /usr/bin/syncd.sh start Cannot get Broadcom Chip Id. Skip set SYNCD_SHM_SIZE. Creating new syncd container with HWSKU Force10-S6000 a4862129a7fea04f00ed71a88715eac65a41cdae51c3158f9cdd7de3ccc3dd31 $ docker inspect syncd \| grep -i shm "ShmSize": 67108864, "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e", On Normal device $ docker inspect syncd \| grep -i shm "ShmSize": 268435456, "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e" change the config syncd_shm.ini to b85=128m $ docker rm -f syncd syncd $ sudo /usr/bin/syncd.sh start Creating new syncd container with HWSKU Force10-S6000 3209ffc1e5a7224b99640eb9a286c4c7aa66a2e6a322be32fb7fe2113bb9524c $ docker inspect syncd \| grep -i shm "ShmSize": 134217728, "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e", change the config under /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/Force10-S6000/platform_env.conf and run command $ cat /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/platform_env.conf SYNCD_SHM_SIZE=300m $ sudo /usr/bin/syncd.sh start Creating new syncd container with HWSKU Force10-S6000 897f6fcde1f669ad2caab7da4326079abd7e811bf73f018c6dacc24cf24bfda5 $ docker inspect syncd \| grep -i shm "ShmSize": 314572800, "Tag": "fix_8.1_shm_issue.67873427-9f7ca60a0e", Signed-off-by: richardyu-ms <richard.yu@microsoft.com>	2023-01-30 20:23:03 -08:00
Oleksandr Ivantsiv	c7ecd92c54	Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516 ) - Why I did it fixes #12907 When the management interface IP address configuration changes from dynamic to static the DNS configuration (retrieved from the DHCP server) in /etc/resolv.conf remains uncleared. This leads to a DNS configuration pointing to the wrong nameserver. To make the behavior clear DNS configuration received from DHCP should be cleared. - How I did it Use resolvconf package for managing DNS configuration. It is capable of tracking the source of DNS configuration and puts the configuration retrieved from the DHCP servers into a separate file. This allows the implementation of DNS configuration cleanup retrieved from DHCP during networking reconfiguration. - How to verify it Ensure that the management interface has no static configuration. Check that /etc/resolv.conf has DNS configuration. Configure a static IP address on the management interface. Verify that /etc/resolv.conf has no DNS configuration. Remove the static IP address from the management interface. Verify that /etc/resolv.conf has DNS configuration retrieved form DHCP server.	2023-01-30 22:13:10 +02:00

1 2 3 4 5 ...

1240 Commits