Why I did it
Upgrade the xgs SAI version to 8.4.39.2 to include the following fix:
8.4.36.0: [submodule upgrade] [SAI_BRANCH rel_ocp_sai_8_4] SID: SDK-381039 Cosq control dynamic type changes
8.4.37.0: SID: MMU cosq control configuration with Dynamic Type Check
8.4.38.0: [sbumodule upgrade] [CSP 0001232212][SAI_BRANCH rel_ocp_sai_8_4]back-porting SONIC-82415 to SAI 8.4
8.4.39.0: [CSP CS00012320979] Port SONIC-81867 sai spec compliance for get SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO
8.4.39.1: changes for phy-re-init of 40G ports for TH platforms CS00012327470
8.4.39.2: fix capability for Hostif queue by change SET operation of SAI_HOSTIF_ATTR_QUEUE to be true
Work item tracking
Microsoft ADO (number only): 26491005
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
https://dev.azure.com/mssonic/internal/_build/results?buildId=457869&view=results
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.
Why I did it
To address thermal logging deficiencies.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:
Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.
Modify the module.py reboot to support reboot linecard from Supervisor
- Modify reboot to call _reboot_imm for single IMM card reboot
- Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output
- Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
- Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
- Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
- Provide ability for IMM to disable all transceiver module TX at reboot time
- Remove defunct xcvr-resync service
DEPENDS ON: sonic-net/sonic-swss#2997sonic-net/sonic-utilities#3093
What I did
Revert the feature.
Why I did it
Revert bgp suppress FIB functionality due to found FRR memory consumption issues and bugs.
How I verified it
Basic sanity check on t1-lag, regression in progress.
* [Celestica-E1031] Enable CPU watchdog (#16083)
Enable CPU watchdog on Celestica-E1031.
* Add info syslog for cpu_wdt.service (#16678)
Why I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
How I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
Why I did it
Release notes for Cisco 8111-32EH-O, 8102-64H-O and 8101-32FH-O:
• Fixed a bug in PFC-WD where watchdog is triggered too often when sparse traffic is present, failing to detect the traffic traversal - (SR 696617830)
• Resolved an issue where SAI_STATUS_ITEM_NOT_FOUND error was seen while adding LAG members - (MIGSMSFT-354)
• Fixed Thermal API related error message (MIGSMSFT-354)
• Fixed an issue related to default config trap - (MIGSMSFT-354)
• Changed the message log level from error to debug in situations when the HW offloaded session is not found or was never created for the packet received. (MIGSMSFT-354)
• Fixed an issue where drop option was not working when encap and decap IPinIP tunnels share the same SDK tunnel port.
• Fixed an error while running VRF testcase (MIGSMSFT-354)
• Fixed an issue where BFD packets not egressing using Queue 7
• SAI support for additional FEC related attributes:
· SAI_PORT_ATTR_MAX_FEC_SYMBOL_ERRORS_DETECTABLE
· SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0
. SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S16
Work item tracking
Microsoft ADO (number only):
Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':
Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.392544 sonic ERR pmon#xcvrd: Traceback (most recent call last):
Nov 27 09:47:16.392643 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1518, in run
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: self.task_worker()
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1240, in task_worker
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: sfp = platform_chassis.get_sfp(pport)
Nov 27 09:47:16.392793 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 346, in get_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self.initialize_single_sfp(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 288, in initialize_single_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self._sfp_list[index] = sfp_module.SFP(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/sfp.py", line 272, in __init__
Nov 27 09:47:16.392866 sonic ERR pmon#xcvrd: from .thermal import initialize_sfp_thermal
Nov 27 09:47:16.392918 sonic ERR pmon#xcvrd: ImportError: cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Exiting main loop as child thread raised exception!
Work item tracking
Microsoft ADO (number only):
How I did it
Add lock for creating SFP object
How to verify it
UNIT TEST
Manual Test
Why I did it
To fix ecmp hash polarization issue.
Work item tracking
Microsoft ADO (number only): 26085143
How I did it
Add sai_hash_seed_config_hash_offset_enable=1 in all config.bcm that Broadcom T1 uses.
HardwareSku
Force10-S6100-T1
Force10-S6100-ITPAC-T1
Force10-S6100
Celestica-DX010-C32
Arista-7260CX3-C64
Arista-7060CX-32S-Q32
Arista-7060CX-32S-C32-T1
Arista-7060CX-32S-C32
Arista-7050QX32S-Q32
Arista-7050QX-32S-S4Q31
Arista-7050-QX32
Arista-7050-QX-32SInclude Broadcom's fix by upgrading xgs SAI version to 8.4.35.0.
8.4.35.0: [CSP 00012324019] back-porting SONIC-75006 to SAI8.4
8.4.34.0:
[CSP 00012318293] back-porting SONIC-81534 to SAI8.4;
ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
Run qual with only xgs SAI version upgraded to 8.4.35.0:
on TH2: https://elastictest.org/scheduler/testplan/6579b36ccfacd86e78e3e885?leftSideViewMode=detail&prop=status&order=ascending
on TH: https://elastictest.org/scheduler/testplan/657a75f8c1d3b51fc1d585b4?leftSideViewMode=detail&prop=status&order=ascending
How to verify it
use tests/ecmp/test_ecmp_sai_value.py to verify.
Why I did it
Update SAI version to SAIBuild2305.26.0.16
Update SDK/FW to 4.6.2134/2012.2134
Fixed issues:
Updated SN3700C to enable limit to 100G speed.
Recovering from Low power mode might ends with port down.
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in makefile
How to verify it
Confirm issues fixed and run sonic-mgmt tests
202305 image does not come up on chassis with SAI 7.1.111.1.
SAI 9.2.0.0 on 202305 image is verified to come up on Arista chassis. Initial testing is also done, no new failures compare to 202205 image, SAI 7.1.111.1.
Why I did it
Bring up 202305 image on chassis.
Work item tracking
Microsoft ADO (number only): 18189434
How I did it
How to verify it
Brought up SAI 9.2.0.0 on Arista chassis.
Ran pipeline on acl, bgp, arp, acms, cacl, copp, decap, fib, iface_namingmode.
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.
Why I did it
To overcome SN2700 20 sec delay on login
Work item tracking
N/A
How I did it
Removed MFT bash autocompletion part
How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Why I did it
Fixed the issue - Some special IPv6 packets cannot be dropped by dataplane ACL rule
Work item tracking
Microsoft ADO (number only):
No
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
Why I did it
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
Work item tracking
Microsoft ADO (number only):
How I did it
Update Cisco platform to 202305.1.0.3
How to verify it
Why I did it
Update SAI to SAIBuild2305.26.0.9 for Mellanox platforms.
Fixed issues:
When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
Work item tracking
Microsoft ADO (number only):
How I did it
Updated SAI version in "mlnx-sai.mk" Makefile.
How to verify it
Run "sonic-mgmt" regression testing.
Why I did it
Release Notes for Cisco 8111-32EH-O and 8102-64H
Fix for "Failed to get port by bridge port ID" error (MIGSMSFT-354)
Added CLI to enable trap events (MIGSMSFT-166)
Support to add critical message upon replace device SAI notification
Added support for input voltage/current/power info for PSUs
Added support for sff_mgr for deterministic bringup of SFF compliant modules
IOFPGA fix to support optics port in low power mode on 8101-32FH-O
Enable CMIS Manager for 8111-32EH-O
Added dump option to “show plat npu mac-state” CLI to dump MAC state info
Added media-based NPU serdes attributes for Credo 800G AEC Y-cables from media_settings.json
Auto FPD support for power CPLD on 8101 and 8111 platforms
Caveats:
Validation on 8101-32FH-O still pending. Will update release notes once completed.
Below 8800 platform specific fixes included but 8800 support not claimed in this code drop
Interop fix for BFD and Fair VOQ
Fix to update voq cgm profile during port speed change event
Create ECN profiles based on port speeds dynamically
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Why I did it
Update SAI version to SAIBuild2305.26.0.0
New features
FDB entries are now restored after warmboot to prevent temporary system flooding.
Update SDK/FW to 4.6.2102/2012.2102
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in make file.
How to verify it
Running sonic-mgmt regression.
* Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Update SAI deb to 1.12.0-3
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
---------
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Drop for 8111-32EH-O:
Fix for clear_trap_configuration errors
Fix OREDERED ECMP NHG drop when route is added before members are added
Fix port handling of empty ecmp group to drop packets
Fix for link_notification_handle error
Auto FPD upgrade support
Work item tracking
Microsoft ADO (number only):
How I did it
update platform to 202305.1.0.1
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.
How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
Why I did it
Update the kernel to 5.10.179 for the 202305 branch
Work item tracking
Microsoft ADO (number only): 24592132
How I did it
How to verify it
Why I did it
First SONIC 202305 based release
Includes all fixes so far up to latest 202205 based 8111 drop (Code drop 111: 202205.main.0.13)
Work item tracking
Microsoft ADO (number only):
How I did it
update to 202305.main.0.1 release
How to verify it
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support
SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
platform/mellanox/mlnx-sai.mk
* Fix issue: unprintable character is rendered when handling comments in j2
Use "{#-" and "-#}" to mark comments in jinja template
Signed-off-by: Stephen Sun <stephens@nvidia.com>
---------
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test