Why I did it
To fix ecmp hash polarization issue.
Work item tracking
Microsoft ADO (number only): 26085143
How I did it
Add sai_hash_seed_config_hash_offset_enable=1 in all config.bcm that Broadcom T1 uses.
HardwareSku
Force10-S6100-T1
Force10-S6100-ITPAC-T1
Force10-S6100
Celestica-DX010-C32
Arista-7260CX3-C64
Arista-7060CX-32S-Q32
Arista-7060CX-32S-C32-T1
Arista-7060CX-32S-C32
Arista-7050QX32S-Q32
Arista-7050QX-32S-S4Q31
Arista-7050-QX32
Arista-7050-QX-32SInclude Broadcom's fix by upgrading xgs SAI version to 8.4.35.0.
8.4.35.0: [CSP 00012324019] back-porting SONIC-75006 to SAI8.4
8.4.34.0:
[CSP 00012318293] back-porting SONIC-81534 to SAI8.4;
ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
Run qual with only xgs SAI version upgraded to 8.4.35.0:
on TH2: https://elastictest.org/scheduler/testplan/6579b36ccfacd86e78e3e885?leftSideViewMode=detail&prop=status&order=ascending
on TH: https://elastictest.org/scheduler/testplan/657a75f8c1d3b51fc1d585b4?leftSideViewMode=detail&prop=status&order=ascending
How to verify it
use tests/ecmp/test_ecmp_sai_value.py to verify.
Why I did it
Update SAI version to SAIBuild2305.26.0.16
Update SDK/FW to 4.6.2134/2012.2134
Fixed issues:
Updated SN3700C to enable limit to 100G speed.
Recovering from Low power mode might ends with port down.
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in makefile
How to verify it
Confirm issues fixed and run sonic-mgmt tests
202305 image does not come up on chassis with SAI 7.1.111.1.
SAI 9.2.0.0 on 202305 image is verified to come up on Arista chassis. Initial testing is also done, no new failures compare to 202205 image, SAI 7.1.111.1.
Why I did it
Bring up 202305 image on chassis.
Work item tracking
Microsoft ADO (number only): 18189434
How I did it
How to verify it
Brought up SAI 9.2.0.0 on Arista chassis.
Ran pipeline on acl, bgp, arp, acms, cacl, copp, decap, fib, iface_namingmode.
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.
Why I did it
To overcome SN2700 20 sec delay on login
Work item tracking
N/A
How I did it
Removed MFT bash autocompletion part
How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Why I did it
Fixed the issue - Some special IPv6 packets cannot be dropped by dataplane ACL rule
Work item tracking
Microsoft ADO (number only):
No
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
Why I did it
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
Work item tracking
Microsoft ADO (number only):
How I did it
Update Cisco platform to 202305.1.0.3
How to verify it
Why I did it
Update SAI to SAIBuild2305.26.0.9 for Mellanox platforms.
Fixed issues:
When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
Work item tracking
Microsoft ADO (number only):
How I did it
Updated SAI version in "mlnx-sai.mk" Makefile.
How to verify it
Run "sonic-mgmt" regression testing.
Why I did it
Release Notes for Cisco 8111-32EH-O and 8102-64H
Fix for "Failed to get port by bridge port ID" error (MIGSMSFT-354)
Added CLI to enable trap events (MIGSMSFT-166)
Support to add critical message upon replace device SAI notification
Added support for input voltage/current/power info for PSUs
Added support for sff_mgr for deterministic bringup of SFF compliant modules
IOFPGA fix to support optics port in low power mode on 8101-32FH-O
Enable CMIS Manager for 8111-32EH-O
Added dump option to “show plat npu mac-state” CLI to dump MAC state info
Added media-based NPU serdes attributes for Credo 800G AEC Y-cables from media_settings.json
Auto FPD support for power CPLD on 8101 and 8111 platforms
Caveats:
Validation on 8101-32FH-O still pending. Will update release notes once completed.
Below 8800 platform specific fixes included but 8800 support not claimed in this code drop
Interop fix for BFD and Fair VOQ
Fix to update voq cgm profile during port speed change event
Create ECN profiles based on port speeds dynamically
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Why I did it
Update SAI version to SAIBuild2305.26.0.0
New features
FDB entries are now restored after warmboot to prevent temporary system flooding.
Update SDK/FW to 4.6.2102/2012.2102
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in make file.
How to verify it
Running sonic-mgmt regression.
* Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Update SAI deb to 1.12.0-3
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
---------
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Drop for 8111-32EH-O:
Fix for clear_trap_configuration errors
Fix OREDERED ECMP NHG drop when route is added before members are added
Fix port handling of empty ecmp group to drop packets
Fix for link_notification_handle error
Auto FPD upgrade support
Work item tracking
Microsoft ADO (number only):
How I did it
update platform to 202305.1.0.1
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.
How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
Why I did it
Update the kernel to 5.10.179 for the 202305 branch
Work item tracking
Microsoft ADO (number only): 24592132
How I did it
How to verify it
Why I did it
First SONIC 202305 based release
Includes all fixes so far up to latest 202205 based 8111 drop (Code drop 111: 202205.main.0.13)
Work item tracking
Microsoft ADO (number only):
How I did it
update to 202305.main.0.1 release
How to verify it
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support
SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
platform/mellanox/mlnx-sai.mk
* Fix issue: unprintable character is rendered when handling comments in j2
Use "{#-" and "-#}" to mark comments in jinja template
Signed-off-by: Stephen Sun <stephens@nvidia.com>
---------
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
= Why I did it
To optimize Mellanox platform SAI build
- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.
- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.
- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs
- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
- Why I did it
Increase UT coverage for Nvidia platform API code
Work item tracking
Microsoft ADO (number only):
- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py
- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.
NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker
How I did it
Implemented new service to clean the shared storage.
How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>