202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogather with this change.
- Why I did it
Added BIOS upgrade infra
- How I did it
Added new make target
- How to verify it
Copy msn3800_bios.tar.gz to platform/mellanox/bios
make configure PLATFORM=mellanox
make target/files/stretch/msn3800_bios.tar.gz
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
Porting https://github.com/sonic-net/sonic-buildimage/pull/3723 to 201911
#### Why I did it
Extend Mellanox FW utils with CPLD update feature
Added support for CPLD upgrade to Mellanox FW utility
#### How I did it
Updated Mellanox FW utility
#### How to verify it
mlnx-fw-upgrade.sh --upgrade --cpld # Regular CPLD update flow
UPDATE_MLNX_CPLD_FW=1 mlnx-fw-upgrade.sh --upgrade # Force CPLD refresh only
#### Ensure to add label/tag for the feature raised. example - [PR#2174](https://github.com/sonic-net/sonic-utilities/pull/2174) where, Generic Config and Update feature has been labelled as GCU.
Why I did it
Add Celestica Silverstone-x platform
How I did it
Add Celestica Silverstone-x platform
How to verify it
verified by SONiC tested platform APIs
verified by SONiC APIs including " psuutil
psushow(show platform psustatus)
sfputil
sfpshow
tempershow(show platform temperature)
fanshow(show platform fan)
watchdogutil
fwutil(show platform firmware status)
decode-syseeprom -d(show platform syseeprom)
show platform ssdhealth
show platform summary
show interfaces status
"
What/Why I did:
Update Broadcom SAI debian package. New Package has following changes:
CaseCS00012248135: Fix shows error message "linux-bcm-knet: Fatal error: Incomplete chain" followed by malformed LACP/LLDP packets
Why I did it
Added Support for Celestica Midstone-100x platform
How I did it
Implemented the support for Celestica Midstone-100x platform
Platform: x86_64-cel_midstone-100x-r0
HwSKU: Midstone-100x
ASIC: innovium
ASIC Count: 1
How to verify it
Run platform test on testbed
- Why I did it
To include the fix for the issue of Modification of shared headroom on the fly can get to negative occupancy that leads to PFC been sent from the switch continuously.
- How I did it
Updated submodule pointer and version in relevant Makefile.
- How to verify it
Build an image and run tests from sonic-mgmt.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
Porting changes from DellEMC: S6100 CPLD upgrade #4299 and DellEMC S6100 CPLD upgrade support #3834 to 201911 branch
Added CPLD upgrade support for DellEMC S6100 platform.
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.
- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable
- How to verify it
1. Manual test
2. Regression
Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.
Enable thermalctld
Backport Platform API changes from master branch.
How I did it
Remove 'skip_thermalctld:true' in pmon_daemon_control.json
Implement the platform API methods in the respective device files
How to verify it
Verified that platform data is displayed by show platform fan and show platform temperature commands.
Why I did it
Cannot retrieve and display the reboot-cause.
How I did it
Correct the platform initialization definition.
How to verify it
Manual reboot and then 'show reboot-cause'
Backport #9258 to 201911
Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
How I did it
When PSU is powered of, don't treat it as absent.
How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
Conflicts:
platform/mellanox/mlnx-platform-api/sonic_platform/thermal_infos.py
- Why I did it
To include latest fixes.
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
3. On rare occasions, when working with port rates of 1GbE or 10GbE and congestion occurs, packets may get stuck in the chip and may cause switch to hang.
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times ( up to 70 seconds).
6. When connecting SN4600C to SN4600C after Fastboot in 50GbE No_FEC mode with a copper cable, the link up time may take ~20 seconds.
- How I did it
Updated SDK submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "soni-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
#### Why I did it
Upgrade Mellanox-SAI to 1.19.3 to support reclaiming reserved buffer on admin down ports
#### How I did it
To support reclaiming reserved buffer on admin down ports.
#### How to verify it
Regression test and manual test.
- Why I did it
Update SDK\FW version to 4.4.3326\2008.3326. This version contains:
New Features:
1. Add support for Fast Boot for SN3800
Bug Fixing:
1. In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
2. On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
3. On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
4. Some Molex cables do not support speed after reboot
- How I did it
- How to verify it
Was verified by running regression tests that includes complete sonic-mgmt tests supported
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not
Updated Broadcom SAI Debian package to 3.7.6.1 Following are the major changes here:
- CS00011651922/CS00012192502 SID:Parity error in TDM Calendar memories causes traffic drop after SER correction
- CS00011222060 soc_mem_alpm_delete: unit 0: ALPM delete operation[L3_DEFIP_ALPM_IPV6_128] encountered parity error
- Cesto Phy Recovery enhancement.
- SDK compile with flag -DBCM_MONOTONIC_TIME and -DBCM_MONOTONIC_MUTEXES
Why I did it
To handle newer SSD firmware version in DellEMC S6100 platform (S210506G - 3IE devices).
How I did it
Update s6100_ssd_upgrade_status.sh to handle newer SSD firmware version.
How to verify it
Logs: UT_logs.txt
Signed-off-by: Dror Prital <drorp@nvidia.com>
* [Mellanox] Update FW version to 2008.3218 (#8079)
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot
- Why I did it
Update FW version to 2008_3110 fixing SN3800 specific warm boot scenario:
1. Disable interface
2. Warm Boot
3. Enable Interface --> link will remain down.
- How I did it
Use new FW that contains the fix for the problem mentioned above
- How to verify it
Run the scenario mentioned above and make sure that the link is up after warm boot
Signed-off-by: Dror Prital <drorp@nvidia.com>
LED_PROC_INIT_SOC variable was incorrectly referenced as LED_SOC_INIT_SOC. Introduced in #5483
Rather than fixing the typo, I decided to simplify the script, removing the need for the conditional altogether by moving the bcmcmd call inside the conditional which checks for the presence of LED_SOC_INIT_SOC.
#### Why I did it
Microsoft reported occasional daemon crashes on devices running 201911. On close inspection it was due to PMBus reads failing on IOError on very rare occasions.
#### How I did it
Add try/except block on performing reads on PMBus GPIOs.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
To recover the SSD upgrade state in case, if ONIE-uninstall or ssd_fw_upgrade folder got deleted.
To handle newer SSD version(S21506G - 3IE GPIO7 low devices).
Also correcting the error messages for non-upgraded S6100s.