- This is primarily to incorporate ungraceful reboot logic to timely bring down front panel ports
- Fix nokia_cmd show syntax and output, fix hw-management-generate-dump
- Fix SFM hotplug serial number empty issue
Upgrade the SAI version to 7.1.73.4 to include the following fix:
7.1.67.4: [CSP CS00012302165] Backport JIRA SONIC-77116 to rel_ocp_sai_7_1
7.1.68.4: [CSP CS00012316299][SAI_BRANCH rel_ocp_sai_7_1] L3 entry delete failed when SER error is present
7.1.69.4: ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
7.1.70.4: JIRA# SONIC-79944
7.1.71.4: SID: MMU cosq control configuration with Dynamic Type Check
7.1.72.4: BACKPORT SONIC-81858 PFCWD on IPv6 JIRA# SONIC-84146]SONIC-81858
7.1.73.4: [CSP CS00012322843] SAI_API_ROUTE:brcm_sai_xgs_route_create:115 iptnl info get failed with error -7
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239
How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.
How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
Co-authored-by: Arun Saravanan Balachandran <52521751+ArunSaravananBalachandran@users.noreply.github.com>
* Revert "[202205] [Mellanox] Fix issue: user must set admin down before toggling LPM (#14370)"
This reverts commit f74c69e876.
* update copyright header
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Update the Brcm SAI 7.0 with following fixes
Offical Brcm SDK fix for memory leak
(CS00012315073 [7.0][J2C+] : PFCWD counter polling causing continuous mem leak on production device)
Official Brcm fix for CPU high
(CS00012317195 High CPU due to SDK calling soc_dnxc_port_resource_get for few stats counters even with bcmCNTR thread)
Offical Brcm SAI fix for getting voq counters working.
CSP CS00012319503: DNX SAI 7.1.60.4 has broken Voq counters support
How to verify it
Validated by running the nightly pipeline on a chassis platform.
Validated that the voq counters, by sensind traffic from T1 VM --> T3 VM
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ1 27 1968 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ7 0 0 0 0
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ1 7099 625680 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ7 0 0 0 0
---------------
The CPU usage has come down in SUP
System 'xxxx-sup-1'
status Running
monitoring status Monitored
monitoring mode active
on reboot start
load average [7.94] [8.70] [7.54]
cpu 2.6%us 45.0%sy 0.0%wa <<<<-- it is 45%
memory usage 8.9 GB [28.6%]
swap usage 0 B [0.0%]
uptime 21m
boot time Fri, 17 Nov 2023 21:55:55
data collected Fri, 17 Nov 2023 22:16:59
-------------
syncd memory usage no increasing.
Release Notes for Cisco 8102-32FH-O:
Fixed platform_test failures in test_component.py
IOFPGA_SJTAG label under ‘fwutil show status’ changed to IOFPGA’
Validated auto FPD upgrade
* [202205][Arista] Update arista platform submodules
- fix issue where platform debug info would no longer be in the dump
- fix issue in scd-xcvr where active low bits couldn't be set
- fix issue in scd-smbus where it perform an oob access
Upgrade the xgs SAI version to 7.1.62.4 to include the following changes:
7.1.62.4: ECMP CRM fix - CS00012312907
7.1.61.4: Includes nexthop group scaling fix - CS00012304075
7.1.60.4: CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
7.1.59.4: [CS00012302400 CS00012302347]backport SONIC-76986 to SAI7.1: Fix the issue--"empty LAG can't be added to ACL entry"
7.1.57.4: [CSP CS00012296571] Backport SONIC-75371 jira on SAI 7.1 branch
7.1.56.4: [CSP CS00012302193] backport SONIC-72912 jira on SAI 7.1 branch
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Update SDK/FW to 4.5.4318/2010.4316 and SAI to 2205.25.1.2 in order to include listed below fixes.
SDK/FW
In some cases, when an ACL has two or more rules with a similar key, modifying/removing one of the rules may cause modification/removal of one of the similar-key rules, instead of the requested rule.
Using module SPQCELRCDFB when connected to a 3rd party switch, there may either be no link or a very long link up time (~2 minutes).
In some case warmboot from 201911 to 202205 might result in dataplane traffic loss
When upgrade SONiC version using warm boot from version 201911/202012 to newer version, then doing cold boot back to older version and upgrade again to newer one warm boot might be fail.
SAI
Added support for dynamic ordered ECMP group (SAI_NEXT_GROUP_TYPE_DYNAMIC_ORDERED_ECMP)
"store and forward" KV was added
Added Support for IPV6 link local debug counters
---------
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
Fixes for
MIGSMSFT-333 / SR 696141124 - Fix OREDERED ECMP NHG drop when route is added before members are added
MIGSMSFT-333 / SR 696141124 – Fix port handling of empty ecmp group to drop packets
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
This fixesNokia-ION/ndk#22
Note that this PR must be coupled with NDK version >= 22.9.13
Why I did it
To provide proper support for CMIS compliant transceiver module CDB operations (including FW related operations).
How I did it
Enhanced the transport subsystem so as to provide for up to 2k bytes of data to be passed to/from modules (as contrasted with the prior max of 128 bytes).
How to verify it
Ensure that new FW (firmware) can be programmed to CMIS compliant module(s) using the 'sfputil firmware ...' commands.
Why I did it
Release Notes for 8102-64H
• Fix NHG drop when route is added before members are added (MIGSMSFT-333 / SR 696141124)
• Added a new system device property "acl_set_dscp_encap_outer_only"
• IN_DISCARD counters report back per-port counters only instead of all counters that are per-port and also that are shared.
How I did it
Update platform version to 202205.2.2.12
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
- Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.
To fix issue #13591.
- How I did it
The select call in get_change_event should use no more than 1 second as timeout parameter.
Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected
- How to verify it
Manual test
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Common Release Notes for 8102-64H, T0/DualTor, and 8101-32FH
Fix for an issue where drop counters were incrementing twice for packets with invalid tag
Fix for the ECC errors reported in SR 695600099
Fix for fwutil show updates failure
How I did it
Update platform version to 202205.2.2.11
To include the following fixes:
DNX:
CS00012287482 - Support for 1024 LAGs on DNX (Added back fix reverted in [202205] Update Broadcom DNX SAI version to 7.1.54.4 #15850)
CS00012302400 - New SAI 7.1.50.4 caused regression in sonic-mgmt ACL test &
ACL entry creation failing with SAI_STATUS_INVALID_PORT_NUMBER in SAI 7.1.50.4
(CS00012302347)
CS00012302163 - SAI_API_BRIDGE:_brcm_sai_bridge_port_learn_flag:1620 sai bridge lag port list get. failed with error -7.
CS00012296571 - LACP packets are queued to Queue 0 instead of Queue 7
CS00012301919 - The traffic is queued to VOQ 8 sometimes instead of destination port's VOQ
CS00012297160 - [SONIC] [J2C+] Traffic to unknown destination route getting enqueued on VOQ 10
CS00012298730 - [7.x][J2/J2C+] : Treat Q=0 as lowest priority and Q=7 as highest priority in Strict Priority Scheduling
Also includes -
XGS:
Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
[SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
Fix capability for Hostif queue on SAI version 7.1
CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
Conflicts:
platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
Release Notes for Cisco T0 and 8102-64H.
• Fix for PSUD crash when PSUs are inserted in an operational system
• Fix for VxLAN counters not incrementing in show vxlan counter' and 'show platform npu vxlan counters'
• Fix for continuous error messages reported by thermalctld
• Fix for dshell client enable/disable causing syncd crash
• Support for 9100 TPID for Cisco fanout.
• Caveat: Drop counters for packets with invalid VLAN tag are counted twice.
Release Notes for Cisco 8101-32FH:
• Aikido FPD 1.89 Upgrade
Update SAI xgs version to 7.1.54.4-3 to include the following XGS changes:
7.1.54.3-1: Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
7.1.54.3-2: [SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
7.1.54.3-3: Fix capability for Hostif queue on SAI version 7.1
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
To pick the below fixes:
DNX fixes:
Temporarily revert fix for CS00012287482 - support for 1024 LAGs on DNX
CS00012297599 - [J2C+] sonic-mgmt failure in test_copp.py (test_no_policer[BGP])
CS00012293560 - ECN remark issue in SONiC
CS00012302371 - SONiC: V6 packets were mapped to wrong TC queue
CS00012288540 - Available ACL Entry and Counter is incorrect after removing ACL rules
Other changes (XGS fixes)
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
[CSP CS00012253527] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH