Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.
How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.
How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.
Signed-off-by: Eric Zhu <erzhu@celestica.com>
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822
How I did it
Add downloading ptf afpacket module in docker file.
How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.
#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.
However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.
To avoid the false alert, improve the monitor to wait and re-check.
Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly:
```
admin@***:~$ sudo systemctl stop serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago
admin@***:~$ sudo /usr/local/bin/check-getty.sh
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.service
● serial-getty@ttyS1.service - Serial Getty on ttyS1
Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```
syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```
#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Why I did it
Update sonic-platform submodule for Nokia-7250IXRE Platform. This requires the new NDK 22.9.8 and above
How I did it
Update submodule sonic-platform for Nokia-7250IXRE platform.
c9f316e Disparate process and thread-safe protection for MDIPC transport, and refactored presence logic to better align with SfpStateUpdateTask operation
a3486cc Added _get_module_bulk_info() and cache the info for 5 seconds to optimize the chassisd update.
4b2e729 Fixed the nokia_cmd show qfpga help display
7b87049 Fixed the nokia_cmd show midplane helper dispaly.
83eabea Add "nokia_cmd set ndk-monitor-action" and "nokia_cmd set ndk-log-level" commands
8aad7de Add nokia_cmd show ndk-version
d2c55e3 Modify the psu.py and module.py to optimize the psud running time
Signed-off-by: mlok <marty.lok@nokia.com>
Fix watchdog reboot cause for wolverine linecard
Fix PSU fan speed of 0% by adding max RPM to most psu descriptions
Add product DCS-7060DX5-64
Add product DCS-7060DX5-32
To include the following DNX changes:
Revert patch and add official SDK/SAI fix for the below CSPs
a. CS00012282080 : syncd crashes after a speed change due to "cosq src vsqs gport get" failure
b. CS00012281200 : J2C+ : Scope of config.bcm SOC property bcm_stat_interval
Fixes for:
a. CS00012278343: SONiC J2c+ Macsec: Shutting down LAG members which have macsec cause
remaining active LAG members to go down
b. CS00012279717: Instance_id printed in SAI syslog messages are truncated to 9 bytes
Why I did it
Update SAI xgs version to 7.1.36.4 to include the following changes.
JIRA# SONIC-69731 (7.1.33.4)
Issue Summary: SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO brcm_sai_get_switch_attribute returns null.
Root Cause: Not implemented.
Fix Description: Get support for SAI switch attr SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO added
JIRA# SONIC-70737 (7.1.34.4)
Issue Summary: ECN being marked as CE even without congestion
Root Cause: ecn_thresh was set to very low value and packets were 100% marked.
Fix Description: ecn_thresh set to correct value
backport SONIC-70081 to SAI7.1 (7.1.35.4)
egress lossy queue PFC Rx fix:ignore PFC signals from egress
Update git submodules (7.1.36.4)
Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA'
to 57d0e360269c4ab659c4790ae471aa4dba2532b4
[SAI_BRANCH rel_ocp_sai_7_1] Broadcom image build failed with SAI 7.1 in DMZ repo (on bullseye)
How I did it
Update SAI xgs code.
How to verify it
Run the SONiC and SAI test with the 7.1 SAI pipeline.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
* Remove apt package lists and make macro to clean up apt and python cache
Remove the apt package lists (`/var/lib/apt/lists`) from the docker
containers. This saves about 100MB.
Also, make a macro to clean up the apt and python cache that can then be
used in all of the containers. This helps make the cleanup be consistent
across all containers.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Update SAI xgs version to 7.1.36.4 to include the following changes and migrate xgs to DMZ repo.
JIRA# SONIC-69731 (7.1.33.4)
Issue Summary: SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO brcm_sai_get_switch_attribute returns null.
Root Cause: Not implemented.
Fix Description: Get support for SAI switch attr SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO added
JIRA# SONIC-70737 (7.1.34.4)
Issue Summary: ECN being marked as CE even without congestion
Root Cause: ecn_thresh was set to very low value and packets were 100% marked.
Fix Description: ecn_thresh set to correct value
backport SONIC-70081 to SAI7.1 (7.1.35.4)
egress lossy queue PFC Rx fix:ignore PFC signals from egress
Update git submodules (7.1.36.4)
Update sdk-src/hsdk_6.5.24_SAI_7.1.0_GA from branch 'hsdk_6.5.24_SAI_7.1.0_GA'
to 57d0e360269c4ab659c4790ae471aa4dba2532b4
[SAI_BRANCH rel_ocp_sai_7_1] Broadcom image build failed with SAI 7.1 in DMZ repo (on bullseye)
How I did it
Update SAI xgs code.
How to verify it
Run the SONiC and SAI test with the 7.1 SAI pipeline.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Why I did it
To include DNX fix
Temp workaround for CS00012281200: J2C+ : Scope of config.bcm SOC property bcm_stat_interval
How I did it
Updated SAI version
How to verify it
Basic validation on DNX platform
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
Why I did it
To pick up the below DNX fixes:
CS00012275689: DSCP->TC and TC->QUEUE mappings are not happening for packets received on LAG ports (SONIC-69367)
CS00012277618: Crash in _brcm_sai_dnx_irpp_port_core_get (SONIC-70001)
How I did it
Updated SAI branch with the above fixes
How to verify it
Ran basic sonic-mgmt tests with the SAI debian on XGS and DNX platforms
Why I did it
[Seastone] Enhancement fix for PR12200 syseeprom issue.
How I did it
Enhance the fix through replace the hardcoded devnum to bash variable
How to verify it
show platform syseeprom or decode-syseeprom
Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
add module reboot APIs for chassis
add supervisor module on linecard (fixes show chassis module midplane-status)
improve RTC update mechanism and sync every 10 mins
fix sbtsi temp sensor presence/thresholds
fix Mineral status leds
remove thermal object on xcvrs
misc fixes
Why I did it
To bring in the following fixes:
Revert temporary fix added to disable SA equal DA drops
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012274222 - How to block the voq for given destination port for a flow from a remote mod-id
CS00012275381 - SAI_INGRESS_PRIORITY_GROUP_STAT_PACKETS is incremented for port's PG's even if there are no traffic sent to that PG
CS00012274433 - Local Fault and Remote Fault are not polled by linkscan thread
How I did it
Merged above fixes to SAI code
How to verify it
Validated by running the basic sanity tests on XGS and DNX chassis platforms including
fib/test_fib.py
decap/test_decap.py
drop_counters/test_drop_counters.py
arp/test_arpall.py
Why I did it
why
In order to apply different config across different platform, and use the code with a unified format, reuse syncd init script to init saiserver.
How I did it
how
Reuse syncd init script
How to verify it
Test
Test in DUT s6000 and dx010 with sonic 202205
Why I did it
Advance SAI Redis head pointer
How I did it
changes:
sonic-net/sonic-sairedis@cf679e7sonic-net/sonic-sairedis@8d6688e
[202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185 sonic-net/sonic-sairedis@66f2961
remove parameter --skip_error, which removed from [202205][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1185
How to verify it
local image build
Why I did it
To bring the following fixes:
Applied customer patch for limited port breakout support in rel_ocp_sai_7_1.
Revert "Merged PR 7224850: SAI7.1_DNX: Temp workaround for Nexthop Group Scale Issue(CS00012251649)".
backport SONIC-67662 to SAI7.1:JR2C ECMP partition for NHgroup members.
How I did it
Updated SAI code with the fixes above.
How to verify it
Run the SONiC and SAI test with the SAI pipeline.
Why I did it
smartctl tool is available only in PMON docker. Hence, the tool may be not accessible incase PMON docker goes down.
Using iSMART_64 tool to fetch the SSD firmware version and device model information.
How I did it
Replacing smartctl with iSMART_64.
1d53bf4 Skip platform NDK health check two times in watchdog.sh
d68297c Added code to shutdown the channel after the grpc call also fixed the show fp-status command
0769efe Impelemented the module API to return the correct eeprom info for fabric card.
171569c Remove explicit logger identifier for transceiver module operations; use inherited id
6c4d651 Corrected the log messages for firmware install
Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
enable sai-ptf logger in sai_adapter to log all the sai api invcations
How I did it
add build parameter to enable the sai-ptf logger when build sai PRC
How to verify it
local build test
test the generated sai_adapter
test with pipeline
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
cherry-pick #12761
Make syncd rpc docker which supports sai-ptf v2
Part of previous PR #11610
local bulild the target
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=broadcom NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-syncd-brcm-rpcv2.gz NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
add partial reboot cause support for linecards
add watchdog support for linecards
add power draw information for chassis
properly implement Chassis.get_port_or_cage_type
fix pcieutil on chassis with powered off cards
fix watchdog-control.service crash
misc fixes and cleanups
fix linecard provisioning issue (500 error)
fix some value types for get_system_eeprom_info API
refactor code to leverage pci topology (enabling dynamic Pcie plugin)
refactor asic declaration logic to new style
misc fixes