Update the Brcm SAI 7.0 with following fixes
Offical Brcm SDK fix for memory leak
(CS00012315073 [7.0][J2C+] : PFCWD counter polling causing continuous mem leak on production device)
Official Brcm fix for CPU high
(CS00012317195 High CPU due to SDK calling soc_dnxc_port_resource_get for few stats counters even with bcmCNTR thread)
Offical Brcm SAI fix for getting voq counters working.
CSP CS00012319503: DNX SAI 7.1.60.4 has broken Voq counters support
How to verify it
Validated by running the nightly pipeline on a chassis platform.
Validated that the voq counters, by sensind traffic from T1 VM --> T3 VM
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ1 27 1968 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ7 0 0 0 0
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ1 7099 625680 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ7 0 0 0 0
---------------
The CPU usage has come down in SUP
System 'xxxx-sup-1'
status Running
monitoring status Monitored
monitoring mode active
on reboot start
load average [7.94] [8.70] [7.54]
cpu 2.6%us 45.0%sy 0.0%wa <<<<-- it is 45%
memory usage 8.9 GB [28.6%]
swap usage 0 B [0.0%]
uptime 21m
boot time Fri, 17 Nov 2023 21:55:55
data collected Fri, 17 Nov 2023 22:16:59
-------------
syncd memory usage no increasing.
* [202205][Arista] Update arista platform submodules
- fix issue where platform debug info would no longer be in the dump
- fix issue in scd-xcvr where active low bits couldn't be set
- fix issue in scd-smbus where it perform an oob access
Upgrade the xgs SAI version to 7.1.62.4 to include the following changes:
7.1.62.4: ECMP CRM fix - CS00012312907
7.1.61.4: Includes nexthop group scaling fix - CS00012304075
7.1.60.4: CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
7.1.59.4: [CS00012302400 CS00012302347]backport SONIC-76986 to SAI7.1: Fix the issue--"empty LAG can't be added to ACL entry"
7.1.57.4: [CSP CS00012296571] Backport SONIC-75371 jira on SAI 7.1 branch
7.1.56.4: [CSP CS00012302193] backport SONIC-72912 jira on SAI 7.1 branch
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
This fixesNokia-ION/ndk#22
Note that this PR must be coupled with NDK version >= 22.9.13
Why I did it
To provide proper support for CMIS compliant transceiver module CDB operations (including FW related operations).
How I did it
Enhanced the transport subsystem so as to provide for up to 2k bytes of data to be passed to/from modules (as contrasted with the prior max of 128 bytes).
How to verify it
Ensure that new FW (firmware) can be programmed to CMIS compliant module(s) using the 'sfputil firmware ...' commands.
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
To include the following fixes:
DNX:
CS00012287482 - Support for 1024 LAGs on DNX (Added back fix reverted in [202205] Update Broadcom DNX SAI version to 7.1.54.4 #15850)
CS00012302400 - New SAI 7.1.50.4 caused regression in sonic-mgmt ACL test &
ACL entry creation failing with SAI_STATUS_INVALID_PORT_NUMBER in SAI 7.1.50.4
(CS00012302347)
CS00012302163 - SAI_API_BRIDGE:_brcm_sai_bridge_port_learn_flag:1620 sai bridge lag port list get. failed with error -7.
CS00012296571 - LACP packets are queued to Queue 0 instead of Queue 7
CS00012301919 - The traffic is queued to VOQ 8 sometimes instead of destination port's VOQ
CS00012297160 - [SONIC] [J2C+] Traffic to unknown destination route getting enqueued on VOQ 10
CS00012298730 - [7.x][J2/J2C+] : Treat Q=0 as lowest priority and Q=7 as highest priority in Strict Priority Scheduling
Also includes -
XGS:
Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
[SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
Fix capability for Hostif queue on SAI version 7.1
CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
Update SAI xgs version to 7.1.54.4-3 to include the following XGS changes:
7.1.54.3-1: Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
7.1.54.3-2: [SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
7.1.54.3-3: Fix capability for Hostif queue on SAI version 7.1
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.
How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.
How to verify it
Reboot stress test at least 100 times without failure.
Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
To pick the below fixes:
DNX fixes:
Temporarily revert fix for CS00012287482 - support for 1024 LAGs on DNX
CS00012297599 - [J2C+] sonic-mgmt failure in test_copp.py (test_no_policer[BGP])
CS00012293560 - ECN remark issue in SONiC
CS00012302371 - SONiC: V6 packets were mapped to wrong TC queue
CS00012288540 - Available ACL Entry and Counter is incorrect after removing ACL rules
Other changes (XGS fixes)
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
[CSP CS00012253527] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
Update SAI xgs version to 7.1.50.4 to include the following changes:
patch fix from CSP CS00012282080 needed to support speed change from 400g to 100g on chassis linecards.
Backport SONIC-71507 VSQF/VSQE are not created after port creation. JIRA# SONIC-71507
Backport JIRA SONIC-70704 to rel_ocp_sai_7_1. JIRA# SONIC-70704
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
DNX fixes:
CS00012287482 - support for 1024 LAGs on DNX
Other changes (XGS fixes)
SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.
How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.
How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.
Signed-off-by: Eric Zhu <erzhu@celestica.com>
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822
How I did it
Add downloading ptf afpacket module in docker file.
How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.
#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.
However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.
To avoid the false alert, improve the monitor to wait and re-check.
Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly:
```
admin@***:~$ sudo systemctl stop serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```
syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```
#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Why I did it
Update sonic-platform submodule for Nokia-7250IXRE Platform. This requires the new NDK 22.9.8 and above
How I did it
Update submodule sonic-platform for Nokia-7250IXRE platform.
c9f316e Disparate process and thread-safe protection for MDIPC transport, and refactored presence logic to better align with SfpStateUpdateTask operation
a3486cc Added _get_module_bulk_info() and cache the info for 5 seconds to optimize the chassisd update.
4b2e729 Fixed the nokia_cmd show qfpga help display
7b87049 Fixed the nokia_cmd show midplane helper dispaly.
83eabea Add "nokia_cmd set ndk-monitor-action" and "nokia_cmd set ndk-log-level" commands
8aad7de Add nokia_cmd show ndk-version
d2c55e3 Modify the psu.py and module.py to optimize the psud running time
Signed-off-by: mlok <marty.lok@nokia.com>
Fix watchdog reboot cause for wolverine linecard
Fix PSU fan speed of 0% by adding max RPM to most psu descriptions
Add product DCS-7060DX5-64
Add product DCS-7060DX5-32
To include the following DNX changes:
Revert patch and add official SDK/SAI fix for the below CSPs
a. CS00012282080 : syncd crashes after a speed change due to "cosq src vsqs gport get" failure
b. CS00012281200 : J2C+ : Scope of config.bcm SOC property bcm_stat_interval
Fixes for:
a. CS00012278343: SONiC J2c+ Macsec: Shutting down LAG members which have macsec cause
remaining active LAG members to go down
b. CS00012279717: Instance_id printed in SAI syslog messages are truncated to 9 bytes
Why I did it
Update SAI xgs version to 7.1.36.4 to include the following changes.
JIRA# SONIC-69731 (7.1.33.4)
Issue Summary: SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO brcm_sai_get_switch_attribute returns null.
Root Cause: Not implemented.
Fix Description: Get support for SAI switch attr SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO added
JIRA# SONIC-70737 (7.1.34.4)
Issue Summary: ECN being marked as CE even without congestion
Root Cause: ecn_thresh was set to very low value and packets were 100% marked.
Fix Description: ecn_thresh set to correct value
backport SONIC-70081 to SAI7.1 (7.1.35.4)
egress lossy queue PFC Rx fix:ignore PFC signals from egress
Update git submodules (7.1.36.4)
Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA'
to 57d0e360269c4ab659c4790ae471aa4dba2532b4
[SAI_BRANCH rel_ocp_sai_7_1] Broadcom image build failed with SAI 7.1 in DMZ repo (on bullseye)
How I did it
Update SAI xgs code.
How to verify it
Run the SONiC and SAI test with the 7.1 SAI pipeline.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* Remove apt package lists and make macro to clean up apt and python cache
Remove the apt package lists (`/var/lib/apt/lists`) from the docker
containers. This saves about 100MB.
Also, make a macro to clean up the apt and python cache that can then be
used in all of the containers. This helps make the cleanup be consistent
across all containers.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Update SAI xgs version to 7.1.36.4 to include the following changes and migrate xgs to DMZ repo.
JIRA# SONIC-69731 (7.1.33.4)
Issue Summary: SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO brcm_sai_get_switch_attribute returns null.
Root Cause: Not implemented.
Fix Description: Get support for SAI switch attr SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO added
JIRA# SONIC-70737 (7.1.34.4)
Issue Summary: ECN being marked as CE even without congestion
Root Cause: ecn_thresh was set to very low value and packets were 100% marked.
Fix Description: ecn_thresh set to correct value
backport SONIC-70081 to SAI7.1 (7.1.35.4)
egress lossy queue PFC Rx fix:ignore PFC signals from egress
Update git submodules (7.1.36.4)
Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA'
to 57d0e360269c4ab659c4790ae471aa4dba2532b4
[SAI_BRANCH rel_ocp_sai_7_1] Broadcom image build failed with SAI 7.1 in DMZ repo (on bullseye)
How I did it
Update SAI xgs code.
How to verify it
Run the SONiC and SAI test with the 7.1 SAI pipeline.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Why I did it
To include DNX fix
Temp workaround for CS00012281200: J2C+ : Scope of config.bcm SOC property bcm_stat_interval
How I did it
Updated SAI version
How to verify it
Basic validation on DNX platform
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
Why I did it
To pick up the below DNX fixes:
CS00012275689: DSCP->TC and TC->QUEUE mappings are not happening for packets received on LAG ports (SONIC-69367)
CS00012277618: Crash in _brcm_sai_dnx_irpp_port_core_get (SONIC-70001)
How I did it
Updated SAI branch with the above fixes
How to verify it
Ran basic sonic-mgmt tests with the SAI debian on XGS and DNX platforms
Why I did it
[Seastone] Enhancement fix for PR12200 syseeprom issue.
How I did it
Enhance the fix through replace the hardcoded devnum to bash variable
How to verify it
show platform syseeprom or decode-syseeprom
Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
add module reboot APIs for chassis
add supervisor module on linecard (fixes show chassis module midplane-status)
improve RTC update mechanism and sync every 10 mins
fix sbtsi temp sensor presence/thresholds
fix Mineral status leds
remove thermal object on xcvrs
misc fixes
Why I did it
To bring in the following fixes:
Revert temporary fix added to disable SA equal DA drops
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012274222 - How to block the voq for given destination port for a flow from a remote mod-id
CS00012275381 - SAI_INGRESS_PRIORITY_GROUP_STAT_PACKETS is incremented for port's PG's even if there are no traffic sent to that PG
CS00012274433 - Local Fault and Remote Fault are not polled by linkscan thread
How I did it
Merged above fixes to SAI code
How to verify it
Validated by running the basic sanity tests on XGS and DNX chassis platforms including
fib/test_fib.py
decap/test_decap.py
drop_counters/test_drop_counters.py
arp/test_arpall.py
Why I did it
why
In order to apply different config across different platform, and use the code with a unified format, reuse syncd init script to init saiserver.
How I did it
how
Reuse syncd init script
How to verify it
Test
Test in DUT s6000 and dx010 with sonic 202205
Why I did it
Advance SAI Redis head pointer
How I did it
changes:
sonic-net/sonic-sairedis@cf679e7sonic-net/sonic-sairedis@8d6688e
[202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185 sonic-net/sonic-sairedis@66f2961
remove parameter --skip_error, which removed from [202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185
How to verify it
local image build