Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
fix linecard provisioning issue (500 error)
fix some value types for get_system_eeprom_info API
refactor code to leverage pci topology (enabling dynamic Pcie plugin)
refactor asic declaration logic to new style
misc fixes
- Why I did it
get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them.
NOTE: in later releases we will get them back via different interface.
- How I did it
Return False * lane_num for get_rx_los and get_tx_fault
- How to verify it
Added unit test
- Why I did it
To include latest fixes and new functionality
SAI fixes and new features
fix#3205239, incorrect object type returned for SG child list
Fix VRF-VNI map entries remove issue
ECC health event and logging
[Port Buffers] restore default queue and pg configuration when all user pools are deleted
Fix EVPN type3 error on removal of uc/bc flood group
Fix EVPN type2 MAC move from local to remote results in SAI failure
Fix Disable learning on VXLAN tunnel
Fix error on VXLAN v6 tunnel removal
Fix port cannot apply schedule group when it is a lag member
Fix BFD add more detailed message on BFD packet not related to any existing session
gcc10 compilation fixes
Disable learning on VXLAN tunnel
Support BFD remote-disc exchange in negotiation stage
Tunnel Loopback packet action attribute implementation (for Dual TOR)
Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only)
Support for CRC2 hash algorithm
Bulk counter support for PGs, queues
Support mirror sample rate attribute (SPC2+)
[Functional] [QoS] | Unable to remove SCHEDULE profile table even if there is no object referencing it
Next hop group optimized bulk API
Reduce verbosity of shared database already exists print
Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+
use same size descriptor pool for rx/tx
fix bfd - notify Sonic for admin-down event
2201 - empty list for supported fec for RJ45 ports
Fix don't disable used tunnel underlay interfaces
SDK fixes
100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom".
Added remote peer UDP port information in BFD packet event.
After editing an ECMP, the resilient ECMP next-hop counter may not count correctly.
Fixed potential memory leaks in some APIs related to LPM
If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero).
In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides
In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected.
In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped.
In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt.
On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
PCI calibration changes from a static to a dynamic mechanism.
SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group.
SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group.
SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump.
SLL configuration is missing in SDK dump.
Spectrum-2 systems, do no support 1GbE on supported 40GbE modules.
When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly.
When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU.
When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed.
When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received.
How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "soni-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
Fix a typo in chassis platform API which causes the following error
>>> import sonic_platform as P
>>> c = P.platform.Platform().get_chassis()
>>> sl = c.get_all_sfps()
>>> sl[0].get_lpmode()
Sep 28 07:48:33 INFO LOG: Initializing SX log with STDOUT as output file.
False
>>> del c
Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__
self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle)
NameError: name 'sfp_module' is not defined
- How I did it
Use self while using the SDK handle
- How to verify it
Manual test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
This reverts commit 9750cb4.
There is a PR to handle master branch revert: #12183
- Why I did it
The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged:
Cannot get module EEPROM information: Input/output error
Before the "bad" PR, the message format is like this:
INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error
It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message:
NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error
Rsyslog no longer truncate such log and many such messages are flooded to syslog.
- How I did it
Revert the PR
- How to verify it
Manual test
- Why I did it
ethtool print error logs when EEPROM of a SFP is not available. It prints error like this:
INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error
INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument
However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP.
- How I did it
Redirect ethtool stderr to subprocess and log it better
- How to verify it
Manual test
- Why I did it
Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes:
• Spectrum-3 | PCI calibration changes from a static to a dynamic mechanism.
• [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Why I did it
Enable syncd container autorestart for Innovium platforms
How I did it
Add critical_process file and sypervisord.conf entry
How to verify it
Tested with autorestart/test_container_autorestart.py::test_containers_autorestart
PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-xxx-dut-sonic-xxx-dut|syncd]
Signed-off-by: rck-innovium rck@innovium.com
* [SAIServer] support saiserver v2 in bullseye
Support build saiserverv2 in bullseye
Add dependencies for building saiserverv2
Upgrade libboost-atomic1.71 to libboost-atomic1.74
Test done:
Local builded with NOSTRETCH=y NOJESSIE=y NOBUSTER=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
* update libboost-atomic from 1.71 to 1.74 for bullseye
Why I did it
The directory /var/warmboot as top directory for warmboot feature is also needed in docker gbsyncd. Some vendor SAI might save data under it. Without it, the SAI init/creation API failure has happened on PikeZ platform.
How I did it
Mount host directory /host/warmboot as /var/warmboot in docker gbsyncd, which is same as what it has done on docker syncd.
Why I did it
It solves a swss orchagent crash issue on PikeZ device, due to link-training setting of external PHY port.
How I did it
Catch up the fix for CS00012257483 in version 7.1.7.2.
- Why I did it
To update MFT package to the latest version.
- How I did it
Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk.
- How to verify it
Run regression testing using tests from sonic-mgmt
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
Cr space timeout on Hold and Release GW - at warm boot
SPC-1 Port in stuck PHY_UP after peer side rebooted
memory leak in sx_api_router_ecmp_update_set
How I did it
update the make file with the new version number
update submodule Switch-SDK-drivers pointer
How to verify it
run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Update HW-MGMT to V.7.0020.3006
1. Support new system SN2201
2. Add COMEX BRDWL respin support
- How I did it
Update the version number of the makefile
Advance the hw-mgmt submodule pointer
- How to verify it
Run full regression on Nvidia platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add more log while doing sysfs reading to increase the debug capability
- How I did it
Log the relevant file path and error number while sysfs reading return None
- How to verify it
Manual test
To reduce rc.local script execution time. Porting changes from [DellEMC] S6100 Platform Service optimization #10989
Changes:
Moving platform-modules-s6100.service and s6100-lpc-monitor.service asynchronous to rc.local script.
- Why I did it
Fix bug: pmon report error on start up because some SKUs do not have hwsku.json
- How I did it
If hwsku.json, do not extract RJ45 port information
- How to verify it
Manual test.
Unit test.
- Why I did it
Support get_port_or_cage_type for RJ45 ports
- How I did it
Implement the new platform API get_port_or_cage_type
Fix the issue: unable to import SFP when chassis object is destructed
- How to verify it
Manually test and regression test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support new platform SN2201 and RJ45 port
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* remove unused import and redundant function
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* fix error introduced by rebase
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* Revert the special handling of RJ45 ports (#56)
* Revert the special handling of RJ45 ports
sfp.py
sfp_event.py
chassis.py
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove deadcode
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support CPLD update for SN2201
A new class is introduced, deriving from ComponentCPLD and overloading _install_firmware
Change _install_firmware from private (starting with __) to protected, making it overloadable
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Initialize component BIOS/CPLD
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove swb_amb which doesn't on DVT board any more
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove the unexisted sensor - switch board ambient - from platform.json
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Do not report error on receiving unknown status on RJ45 ports
Translate it to disconnect for RJ45 ports
Report error for xSFP ports
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Add reinit for RJ45 to avoid exception
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
Give more room for the kernel image in memory
Change-Id: I015856d173d50d94e30d8c555590efb70eb712ae
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
- Why I did it
Advance to new SAI version for bugs fixes as well as new features/enhacements:
New:
1. ARM64 support
2. FG ECMP performance optimization
3. Support setting empty list for port ingress/egress buffer profile list
4. Add service port for SN5600
5. Add CR8/SR8/LR8/KR8 interface type
6. Disable mlxtrace during debug dump
Fixes:
1. Fix SAI_ACL_ENTRY_ATTR_FIELD_TC
2. Fix Packets loop back if no member in portchannel
3. Fix optimize descriptors apply time (and fast boot time)
4. Add flush fdb entries for vxlan tunnel bridge port
5. Don't disable used tunnel underlay interfaces
- How I did it
Advanced SAI submodule
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
* [Mellanox]Check dmi file permission before access (#11309)
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Why I did it
During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog
Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied')
How I did it
Check the file permission before accessing it.
How to verify it
Added UT to verify. Manually verified if the error log is not thrown.
- Why I did it
This is for the eventual support of multiple architectures for the mellanox platform.
- How I did it
Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for.
This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image.
- How to verify it
Build SONiC for mellanox and verify it compiles successfully.
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Why I did it
During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog
Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied')
How I did it
Check the file permission before accessing it.
How to verify it
Added UT to verify. Manually verified if the error log is not thrown.
- Why I did it
To include latest fixes:
1. Warmboot | When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
2. Link Up | When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
3. Shared buffer | While moving from lossless to lossy while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized.
- How I did it
Updated SDK submodule along with the relevant Makefiles
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker
- How I did it
Added the "print_suppressions=0" to ASAN configs.
- How to verify it
add a suppression to some ASAN-enabled component (the suppression should catch some leak)
build with ENABLE_ASAN=y
run a test and see that the ASAN report is empty instead of having the suppression summary
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
To ensure that ASAN logs are always generated. Currently, the way to get the logs is to map the "/var/log/asan" outside of a container, which doesn't work for DVS test run with "--imgname" option.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This fixes the build for armhf to be able to use '/device///installer.conf' files. Specifically, armhf needs support to be able to change the size of /var/log/ directory. It is hardcoded to 512 bytes on all armhf platforms currently. This change will allow any armhf platform to be able to use an installer.conf file to customize the installed image.
- Why I did it
"import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before.
- How I did it
Find slow import and call them when need.
- How to verify it
Measure the import time.
Currently, the build with ASAN_ENABLE=y reuses the packages built with
ASAN_ENABLE=n (and vice versa). To address this issue, ASAN_ENABLE is added to DEP_FLAGS for asan-enabled packages (docker-syncd-mlnx, syncd, docker-orchagent, swss).
- Why I did it
To make dpkg cache use/rebuild the packages for ASAN_ENABLE=y/n.
- How I did it
Added ASAN_ENABLE to the DEP_FLAGS for asan-enabled packages.
- How to verify it
Built with ASAN_ENABLE=y/n and checked the .flags .log files.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is to improve the readability of ASAN reports. The debug package adds function names and source code references to the backtrace (currently, there are only binary addresses of functions)
Another way to address this issue is to build the image with "INSTALL_DEBUG_TOOLS=y". The downside of this approach is that the image size and compilation time are unnecessarily big. Also, the idea is to make the "ENABLE_ASAN" self-sufficient, which would not be the case for this approach.
- Why I did it
To improve the readability of asan logs.
- How I did it
Added SYNCD_DBG and SWSS_DBG to corresponding docker images for ASAN_ENABLE=y build
- How to verify it
Add artificial memory leak
Build with ASAN_ENABLE=y
Test the image and check the ASAN report
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
Update MFT to newer version
- How I did it
Update MFT_VERSION in platform/mellanox/mft.mk
- How to verify it
Check version via dpkg -l | grep mft
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Why I did it
add celestica belgite platform
How I did it
add belgite platform in celestica
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Co-authored-by: anjian <anjian@celestica.com>
Co-authored-by: sandycelestica <sandyli@celestica.com>
This update has following changes
Refactor pci topology logic for chassis (fixes some chassis commands and chassisd on linecard)
Introduce new cooling algorithm
Fix linecard poweroff logic when supervisor is going down
Fix linecard status led leading to system-health crashing
Misc fixes
- Why I did it
Script fails when there is an exception while reading
- How I did it
Add more logs and checks. Fix wrong variable naming and messages.
- How to verify it
Provoke exception while read_eeprom() and check that it is handled properly
Why I did it
To include ONIE version in show platform firmware status command output in DellEMC S6100 and Z9332f platforms.
How I did it
Include ‘ONIE’ in the list of components provided by platform APIs in DellEMC S6100 and Z9332f.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins in DellEMC S6100.
Fixes#9279
- Why I did it
Part of larger effort to move all SONiC systems to bullseye
- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3
- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.1.1
SDK/FW features:
1. Added support for Finisar DR4 (FTCD4523E2PCM) on Spectrum-2 and Spectrum-3 systems.
SAI Features:
1. ECMP overlay support for IPv6
2. BFD offloading / 4K scale
3. Host interface user traps + improved trap registration (table entry)
4. gcc11 compilation fixes
5. Read support for ACL redirect action
6. Optimize ECMP DB size
7. Buffer descriptors new defaults
8. Updated port mapping for SN2201
SAI Fixes:
1. Debug counter removal when configured with all drop reasons
- Why I did it
Upgrade Mellanox SDK and SAI versions to latest
- How I did it
Updated submodule pointers
- How to verify it
Regression tested
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.
docker-database:latest
docker-swss:latest
When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.
This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.
docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag
The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
Why I did it
Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well.
Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts.
Signed-off-by: Zhaohui Sun zhaohuisun@microsoft.com
How I did it
Add python3 virtual environment for docker-ptf.
Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here:
p4lang/ptf#173p4lang/ptf#174
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
On vs platform, egress_lossless_pool's mode is static.
So the corresponding profile should be of static_th as well.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Optimize dx010 sonic platform init script to speed up init process
* Merge issue #10152: [warm-upgrade][202012] Slow Celestica platform init
in rc.local causes lacp-teardown fix into master branch
Signed-off-by: Eric Zhu <erzhu@celestica.com>
- Why I did it
To support docker-sonic-vs image with ASAN.
- How I did it
1. Made the supervisord.conf a template
2. Added the 'log_path' environment variable for ASAN-enabled daemons
3. Added supervisord.conf.j2 generation and ASAN lib to the docker-sonic-vs/Dockerfile.j2
- How to verify it
1. Made a build with ENABLE_ASAN=y
2. Run the tests, checked ASAN reports
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB.
- How I did it
1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better
2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test
3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud
- How to verify it
Manual test.
Run sonic-mgmt regression
Fix the issues #10501 and #9733
If having gearbox, we need:
* add gbsyncd as a peer since swss also has dependency on gbsyncd
* add service gbsyncd to FEATURE table if it is missing
- Why I did it
Implement newly added reboot causes in PR Azure/sonic-platform-common#277
- How I did it
Map the reboot cause sysfs to the newly added reboot causes.
- How to verify it
manual test, check whether the reboot cause is correct after rebooting the switch in various ways.
run the community reboot test to see whether the reboot cause checker is passing.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
The v0.7.5 has bug fix for the support of gearbox port and macsec counters. It also includes a owl firmware update with owl.lz4.fw.1.94.0.bin.
How I did it
Update credo sai url for v0.7.5
Update gearbox_config.json with using firmware owl.lz4.fw.1.94.0.bin instead of owl.lz4.fw.1.92.1.bin
How to verify it
Test gearbox port and macsec counter successfully on A7280.
Why I did it
To support address sanitizer for Mellanox syncd
How I did it
/var/log/asan is mapped for syncd container (the same as for swss)
container stop() has a timeout (60s) for syncd (the same as for swss)
This is so libasan has enough time to generate a report.
added ASAN's log path to Mellanox syncd supervisord.conf
added "asan: yes" to sonic_version.yml
How to verify it
Added artificial memory leaks
Compiled with ENABLE_ASAN=y
Installed the image on DUT
Rebooted the DUT
Verified that /var/log/asan/syncd-asan.log contains the leaks
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
There is a hardware bug that PSU voltage threshold sysfs returns incorrect value. The workaround is to call "sensor -s" to refresh it.
- How I did it
Call "sensor -s" when the threshold value is not incorrect and PSU is "DELTA 1100"
- How to verify it
Unit test and Manual test
Why I did it
Prevent from i2c bus to get locked.
How I did it
Add sysfs driver to access ioport.
Command to reset i2c mux:
echo 1 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Command to bring i2c mux out of reset:
echo 0 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
Why I did it
For trident4/tomahawk4, linux_ngknet.ko and linux_ngknetcb.ko have to be installed. Also, the kernel modules to load on such chips are different from existing ones, so we add an option is_ltsw_chip to determine the kernel modules to load. The option is_ltsw_chip is controlled by adding 'is_ltsw_chip=1' to platform_env.conf or not.
How to verify it
We verified that existing platforms still work after this change; and for platforms with trident4/tomahawk4, we can load the different kernel modules as expected after adding 'is_ltsw_chip=1' to platform_env.conf
b67d479 Fixed the sfp refactor issue
827c5a6 Added nokia_cmd command nokia_common grpc support for power down/up SFM module
aeb7f56 Added the nokia cli commands for midplane
c57d083 Fix the get_my_module issue and the thermal_infos exception issue.
0536293 Change the output of "show chassis module status"
63212d7 Enhance the help display for nokia_cmd command
e8d2599 Fix the sonic_install_ndk_service script issue
d52bdcf Add command nokia_cmd show sfm-eeprom support
Signed-off-by: mlok <marty.lok@nokia.com>
* [BFN] Fix for run fwutil without sudo
SONiC has a concept of "platform components"
this may include - CPLD, FPGA, BIOS, BMC, etc.
These changes are needed to read the version of the BIOS and BMC component.
What I did
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
```
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
```
How I did it
Modification of dmidecode command
How to verify it
Run manually 'fwutil' (without sudo)
Previous command output had exception
New command output:
Root privileges are required
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Why I did it
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
How I did it
Modification of dmidecode command
How to verify it
Run manually 'fwutil' (without sudo)
Previous command output had exception
New command output:
Root privileges are required
Signed-off-by: Taras Keryk tarasx.keryk@intel.com
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* rewrite a call of dmidecode, when run without sudo
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Why I did it
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
The previous implementaion of eeprom.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/config.py", line 564, in configure
handler = self.configure_handler(handlers[name])
File "/usr/lib/python3.9/logging/config.py", line 745, in configure_handler
result = factory(**kwargs)
File "/usr/lib/python3.9/logging/handlers.py", line 153, in init
BaseRotatingHandler.init(self, filename, mode, encoding=encoding,
File "/usr/lib/python3.9/logging/handlers.py", line 58, in init
logging.FileHandler.init(self, filename, mode=mode,
File "/usr/lib/python3.9/logging/init.py", line 1142, in init
StreamHandler.init(self, self._open())
File "/usr/lib/python3.9/logging/init.py", line 1171, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding,
PermissionError: [Errno 13] Permission denied: '/var/log/platform.log'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/init.py", line 3, in
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 41, in
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 162, in init
self.chassis_component_map = self.__get_chassis_component_map()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 168, in __get_chassis_component_map
chassis_name = self.__chassis.get_name()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 146, in get_name
return self._eeprom.modelstr()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 54, in _eeprom
self.__eeprom = Eeprom()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/eeprom.py", line 50, in init
logging.config.dictConfig(config_dict)
File "/usr/lib/python3.9/logging/config.py", line 809, in dictConfig
dictConfigClass(config).configure()
File "/usr/lib/python3.9/logging/config.py", line 571, in configure
raise ValueError('Unable to configure handler '
ValueError: Unable to configure handler 'file'
How I did it
Modification call of dmidecode command.
Added modification of log files access attributes before file open operations.
How to verify it
Run manually 'fwutil' (without sudo)
New command output have no exception.
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Added file_check for checking access to log files for eeprom.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Removed unused import
* Added logfile_create to eeprom.py and chassis.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Created platform_utils.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Added interpreter string to platform_utils.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
Why I did it
Support pddf to as4630/as7816/as7326
How I did it
Send needed file to the PR for these platform
How to verify it
Test sensors and show platform cmd.
root@as7326-56x-3:/home/admin# show platform psustatus
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED
PSU 1 FSF045-611 FSF0451912000505 N/A 12.06 5.50 66.00 OK green
PSU 2 FSF045-611 FSF0451912000568 N/A 12.00 5.50 66.00 OK green
root@as7326-56x-3:/home/admin# sensors
lm75-i2c-15-4a
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +35.5 C (high = +80.0 C, hyst = +75.0 C)
lm75-i2c-15-4b
Adapter: i2c-1-mux (chan_id 6)
CPU Board Temperature: +29.0 C (high = +80.0 C, hyst = +75.0 C)
fan_ctrl-i2c-11-66
Adapter: i2c-1-mux (chan_id 2)
fan1: 9100 RPM
fan2: 9400 RPM
fan3: 9300 RPM
fan4: 9600 RPM
fan5: 9000 RPM
fan6: 9100 RPM
fan7: 9100 RPM
fan8: 9300 RPM
fan9: 9200 RPM
fan10: 9400 RPM
fan11: 9200 RPM
fan12: 9400 RPM
pch_haswell-virtual-0
Adapter: Virtual device
temp1: +43.0 C
psu_pmbus-i2c-17-59
Adapter: i2c-1-mux (chan_id 0)
in3: +12.06 V
fan1: 6272 RPM
temp1: +37.0 C
power2: 60.00 W
curr2: +6.00 A
lm75-i2c-15-49
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +40.0 C (high = +80.0 C, hyst = +75.0 C)
lm75-i2c-15-48
Adapter: i2c-1-mux (chan_id 6)
Main Board Temperature: +39.0 C (high = +80.0 C, hyst = +75.0 C)
psu_pmbus-i2c-13-5b
Adapter: i2c-1-mux (chan_id 4)
in3: +12.00 V
fan1: 6144 RPM
temp1: +36.0 C
power2: 66.00 W
curr2: +5.50 A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 0: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 1: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 2: +50.0 C (high = +82.0 C, crit = +104.0 C)
Core 3: +50.0 C (high = +82.0 C, crit = +104.0 C)
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
DellEMC: N3248TE platform API2.0 changes
Why I did it
N3248TE Platform API 2.0 changes
How I did it
Implemented the functional API's needed for Platform API 2.0
Added system_health_monitoring_config.json file
How to verify it
Used the API 2.0 test suite to validate the test cases.
Co-authored-by: Arun LK <Arun_L_K@dell.com>
Why I did it
DellEMC : Added support for N3248TE/N3248PXE platforms
How I did it
Implemented the changes to enable/disable the watchdog support
How to verify it
watchdog_unit_test.txt
Co-authored-by: Arun LK <Arun_L_K@dell.com>
- Why I did it
Take new hw-mgmt release to SONiC, including:
New features:
1. hw-mgmt: add to PSU FW upgrade tool command to show current FW version
2. hw-mgmt: add to PSU FW upgrade tool support for single-PSU-in-the-system FW upgrade
3. hw-mgmt: add attribute “/firmware” to show FW version of restricted upgradable PSUs only
4. hw-mgmt: Add NVME temperature reports attributes (_alarm/_crit/_min/_max)
Bug fix:
1. psu: redundant i2c_addr attributes being created for psu 3 & 4 in system having only 2 psus.
2. hw-mgmt: in SPC1/2 i2c driver removal is too slow vs. ASIC reset causing non-functional log errors
3. PSU thresholds sysfs changed in 5.10 to “read only” preventing modification (modification required due PSU HW bug)
4. CPLD3 sysfs attribute missing after chip down/up flow
5. sysfs attributes missing when hw-mgmt is restarted (stop/start) within systemd
Release notes can be found from link https://github.com/Mellanox/hw-mgmt/blob/V.7.0020.2004/debian/Release.txt
- How I did it
Update hw-mgmt make file with new version number
Update hw-mgmt submodule pointer
- How to verify it
Run platform regression on all Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Implement read_eeprom/write_eeprom API for Credo Y-cable for Dual ToR Active-Standby
- How I did it
Use mlxreg utility for API implementation
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Why I did it
Support for show platform temp/fan for psu-temp and fan.
Original code doesn't has fan_drawer to support these information.
How I did it
Support for show platform temp/fan for psu-temp and fan.
Add fan_drawer.py and update thermal.py to add needed code.
It need PDDF common code to support . (Refer to #10213)
How to verify it
Test show platform temp and show platform fan.
root@as7726-32x-2:~# show platform fan
Drawer LED FAN Speed Direction Presence Status Timestamp
Fantray1 green Fantray1_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray1 green Fantray1_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray2 green Fantray2_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray2 green Fantray2_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray3 green Fantray3_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray3 green Fantray3_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray4 green Fantray4_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray4 green Fantray4_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray5 green Fantray5_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray5 green Fantray5_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray6 green Fantray6_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray6 green Fantray6_2 38% EXHAUST Present OK 20220311 08:13:04
N/A green PSU1_FAN1 23% EXHAUST Present OK 20220311 08:13:04
N/A green PSU2_FAN1 22% EXHAUST Present OK 20220311 08:13:04
root@as7726-32x-2:~# show platform temp
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
PSU1_TEMP1 28 N/A N/A N/A N/A False 20220311 08:13:04
PSU2_TEMP1 25 N/A N/A N/A N/A False 20220311 08:13:04
TEMP1 23.5 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP2 27 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP3 24 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP4 27 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP5 24 80.0 N/A N/A N/A False 20220311 08:13:04
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
Why I did it
To reduce the processing time of rc.local, refactoring s6100 platform initialization.
Porting changes from 202012 branch [202012] Refactoring DELL platform init to reduce rc.local processing time #10171
Why I did it
Kernel hang in during early boot is caused due overwriting of device tree with uncompressing kernel. Added the fdt_high which gives a safe offset from kernel location.
How I did it
Setting uboot environment variable fdt_high.
How to verify it
Successful boot of bullseye kernel on Marvell Armada 380/385.
Change-Id: I3e2521780f5ecdb3bdf6cbb6542250814ca11959
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Removing incorrect check in plt setup for fw_env config: This check was added before to compare 2 different types of disk. Now the check is redundant and check is not required as transition is complete.
2)Removing legacy_volume_label in create_partition: legacy_volume_label is not used in armhf install files. With legacy_volume_label initialized to NULL, current code will always return true for check, if demo_part exits.
How I did it
Change is about removing the redundant/incorrect code explained above.
How to verify it
uboot fw_printenv and fw_setenv is tested
onie-nos-install has be verified.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
- Why I did it
Update NVIDIA Copyright header to "mellanox" files which were changed since 1.1.2022
- How I did it
Update the copyright header
- How to verify it
Sanity tests and PR checkers.
- Why I did it
Refactor SFP code to remove code duplication and to be able to use the latest features available in new APIs.
- How I did it
Refactor SFP code to remove code duplication and to be able to use the latest features available in new APIs.
- How to verify it
Run sonic-mgmt/platform_tests/sfp tests
The following changes are provided to support bullseye and the latest master
branch content.
- Accommodate relocated fan and thermal sysfs entries in bullseye
- Add support for chassis and PSU HW revision
Why I did it
Fix platform issues introduced by the bullseye kernel upgrade.
How I did it
Minor fixes to Nokia ixs7215 platform code
How to verify it
Execute the following CLI commands
show platform summary
show platform fan
show platform temperature
- Why I did it
With the previous MFT 4.18.1-16 there is a bug in mstdump tool accessing wrong address. it is confirmed this issue does not exist in official 4.18.0-106.
- How I did it
Update the MFT version to 4.18.0-106
- How to verify it
Run regression on Mellanox platforms
Why I did it
Add CPU thermal control for Nvidia platforms which will be enabled for platforms that have heavy CPU load. Now it is only enabled on 4800, and it will be enabled on future platforms.
How I did it
Check CPU pack temperature and update cooling level accordingly
How to verify it
Manual test
Added sonic-mgmt test case, PR link will update later
- Why I did it
Fix issue: psu might use wrong voltage sysfs which causes invalid voltage value. The flow is like:
1. User power off a PSU
2. All sysfs files related to this PSU are removed
3. User did a reboot/config reload
4. PSU will use wrong sysfs as voltage node
- How I did it
Always try find an existing sysfs.
- How to verify it
Manual test
- Fix i2c bus on crow cpu
- Fix exception handling in logs
- Improve linecard mgmt interface configuration
- Add new PSU models for chassis
- Misc fixes
Why I did it
fan_drawer support was missing in PDDF common platform APIs. This resulted in 'thermalctld' not working and 'show platform fan' and 'show platfomr temperature' commands not working.
_thermal_list array inside PSU class was not initialized. Made changes to attach the PSU related thermal sensors in the PSU instance.
How I did it
Added a common class pddf_fan_drawer.py. This class uses the PDDF JSON to fetch the platform specific data. A platform which uses PDDF would follow the below hierarchy.
fan_drawer_base.py ---> pddf_fan_drawer.py ---> fan_drawer.py
How to verify it
Run the 'show platform fan' and 'show platform temperature' commands and check the o/p.
o/p on AS7326:
root@sonic:/home/admin# show platform fan
s Drawer LED FAN Speed Direction Presence Status Timestamp
-------- ----- ---------- ------- ----------- ---------- -------- -----------------
Fantray1 green Fantray1_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray1 green Fantray1_2 38% EXHAUST Present OK 20220311 04:15:03
Fantray2 green Fantray2_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray2 green Fantray2_2 38% EXHAUST Present OK 20220311 04:15:03
Fantray3 green Fantray3_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray3 green Fantray3_2 38% EXHAUST Present OK 20220311 04:15:03
Fantray4 green Fantray4_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray4 green Fantray4_2 38% EXHAUST Present OK 20220311 04:15:03
Fantray5 green Fantray5_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray5 green Fantray5_2 38% EXHAUST Present OK 20220311 04:15:03
Fantray6 green Fantray6_1 38% EXHAUST Present OK 20220311 04:15:03
Fantray6 green Fantray6_2 38% EXHAUST Present OK 20220311 04:15:03
N/A off PSU1_FAN1 0% Present Not OK 20220311 04:15:05
N/A green PSU2_FAN1 34% EXHAUST Present OK 20220311 04:15:05
hroot@sonic:/home/admin# show platform temperature
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
---------- ------------- --------- -------- -------------- ------------- --------- -----------------
PSU1_TEMP1 0 N/A N/A N/A N/A False 20220311 04:15:05
PSU2_TEMP1 37 N/A N/A N/A N/A False 20220311 04:15:05
TEMP1 37 80.0 N/A N/A N/A False 20220311 04:15:05
TEMP2 27 80.0 N/A N/A N/A False 20220311 04:15:05
TEMP3 28.5 80.0 N/A N/A N/A False 20220311 04:15:05
TEMP4 30.5 80.0 N/A N/A N/A False 20220311 04:15:05
root@sonic:/home/admin#
root@sonic:/home/admin#
root@sonic:/home/admin#
o/p on AS7726:
root@as7726-32x-2:~# show platform fan
Drawer LED FAN Speed Direction Presence Status Timestamp
-------- ----- ---------- ------- ----------- ---------- -------- -----------------
Fantray1 green Fantray1_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray1 green Fantray1_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray2 green Fantray2_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray2 green Fantray2_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray3 green Fantray3_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray3 green Fantray3_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray4 green Fantray4_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray4 green Fantray4_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray5 green Fantray5_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray5 green Fantray5_2 38% EXHAUST Present OK 20220311 08:13:04
Fantray6 green Fantray6_1 38% EXHAUST Present OK 20220311 08:13:04
Fantray6 green Fantray6_2 38% EXHAUST Present OK 20220311 08:13:04
N/A green PSU1_FAN1 23% EXHAUST Present OK 20220311 08:13:04
N/A green PSU2_FAN1 22% EXHAUST Present OK 20220311 08:13:04
root@as7726-32x-2:~# show platform temp
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
---------- ------------- --------- -------- -------------- ------------- --------- -----------------
PSU1_TEMP1 28 N/A N/A N/A N/A False 20220311 08:13:04
PSU2_TEMP1 25 N/A N/A N/A N/A False 20220311 08:13:04
TEMP1 23.5 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP2 27 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP3 24 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP4 27 80.0 N/A N/A N/A False 20220311 08:13:04
TEMP5 24 80.0 N/A N/A N/A False 20220311 08:13:04
* Initial pass of EdgeCore platform changes.
* Remove libevent dependency from lldpd.
* Remove python2 dependencies python3.7 force from platform install script.
* Include usbmount support changes.
* Add missing 4630 install file.
* Update a few file permissions. Add umask line to Makefile. Specify python3.9 in install script.
* Misc platform updates:
- Add missing fan drawer component to sonic_platform
- Remove kernel version specification from Makefile
- Update to 4630 utility
* - Fix file permissions on source files
- Fix compile issue with 4630 driver modules (set_fs, get_fs, no longer supported in kernel 5.10)
* Fix missing/extra parens in 4630 util script.
* Fix indentation in fanutil.py.
* Integrate deltas from Edgecore to ec_platform branch.
* Installer update from Edgecore to resolve smbus serial console errors.
* Update stable_size for warm boot.
* Fix SFP dictionary key to match xcvrd.
* - Add missing define in event.py files needed for xcvrd
- Fix SFP info dict key for 7xxx switches
* 5835 platform file updates including installer and 5835 utility.
* 5835 fix for DMAR errors on serial console.
* Don't skip starting thermalctld in the pmon container.
* Revert several changes that were not related to platform.
* Run thermalctld in pmon container.
* Don't disable thermalctld in the pmon container.
* Fix prints/parens in 7816 install utility.
* - Incorporate 7816 changes from Edgecore
- Fix 7326 driver file using old kernel function
* Update kernel modules to use kernel_read().
* Fix compile errors with 7816 and 7326 driver modules.
* Fix some indents preventing platform files from loading.
* Update 7816 platform sfp dictionary to match field names in xcvrd.
* Add missing service and util files for 7816.
* Update file names, etc. based on full SKU for 7816.
* Delete pddf files not needed. These were causing conflicts with API2.0
implementation.
* Remove pddf files suggested by Edgecore that were preventing API2.0 support from starting.
* Install API2.0 file instead of pddf.
* Update 7326 mac service file to not use pddf. Fix syntax errors in 7326 utility script.
* Fix sonic_platform setup file for 7326.
* Fix syntax errors in python scripts.
* Updates to 7326 platform files.
* Fix some tab errors pulled down from master merge.
* Remove pddf files that were added from previous merge.
* Updates for 5835.
* Fix missing command byte for 5835 psu status.
* Fix permission bits on 4630 service files.
* Update platforms to use new SFP refactoring.
* Fix unused var warnings.
Correct libsaithrift dependency package name from
LIBTHRIFT_DEV_0_14_1 THRIFT_COMPILER_0_14_1 to
LIBTHRIFT_0_14_1_DEV THRIFT_0_14_1_COMPILER
How I did it
How to verify it
Test Done:
make BLDENV=buster SAITHRIFT_V2=y -f Makefile.work target/debs/buster/saiserverv2_0.9.4_amd64.deb
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
Why I did it
The previous implementaion of API for platform component didn't have the new thrift files
How I did it
Add the new thrift-generated: pltfm_mgr_rpc.py, ttypes.py
How to verify it
Run manually 'fwutil show status' or run unit tests
Previous command output had no information about components
New command output
Chassis Module Component Version Description
------------------------ -------- ----------- --------- -------------
Chassis1 N/A BIOS 1.2.3 Chassis BIOS
BMC 5.1 Chassis BMC
* [BFN] Implementation API for platform component
SONiC has a concept of "platform components"
this may include - CPLD, FPGA, BIOS, BMC, etc.
These changes are needed to read the version of the BIOS and BMC component.
What I did
Create components.py module
Add funcion for reading componet version to thrift interface
How I did it
The previous implementaion didn't have platform components API, so fwutil return an empty list.
After implementation of the platform component API, we have actual list of platform components and firmware versions
How to verify it
Run manually 'fwutil show status' or run unit tests
Previous command output
Chassis Module Component Version Description
------------------------ -------- ----------- --------- -------------
New command output
Chassis Module Component Version Description
------------------------ -------- ----------- --------- -------------
Chassis1 N/A BIOS 1.2.3 Chassis BIOS
BMC 5.1 Chassis BMC
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* [BFN] Implementation API for platform component
SONiC has a concept of "platform components"
this may include - CPLD, FPGA, BIOS, BMC, etc.
These changes are needed to read the version of the BIOS and BMC component.
What I did
Create components.py module
Add funcion for reading componet version to thrift interface
How I did it
The previous implementaion didn't have platform components API, so fwutil return an empty list.
After implementation of the platform component API, we have actual list of platform components and firmware versions
How to verify it
Run manually 'fwutil show status' or run unit tests
Previous command output
Chassis Module Component Version Description
------------------------ -------- ----------- --------- -------------
New command output
Chassis Module Component Version Description
------------------------ -------- ----------- --------- -------------
Chassis1 N/A BIOS 1.2.3 Chassis BIOS
BMC 5.1 Chassis BMC
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* [BFN] Implementation API for platform component
get chassis name from json
* [BFN] Implementation API for platform component
Updated platform and platrom_components json
* [BFN] Implementation API for platform component
Fixed spaces in component.py
* [BFN] Implementation API for platform component
Fixed exception in component.py
* Update chassis.py
* [BFN] Implementation API for platform component
Fixed spaces in component.py, chassis.py
* [BFN] Implementation API for platform component: Fixed spaces in component.py, chassis.py
* Fixed exception in get_bios_version
Why I did it
N3248TE - Platform API 2.0 changes
How I did it
Implemented the functional API's needed for Platform API 2.0
How to verify it
Used the API 2.0 test suite to validate the test cases.
Signed-off-by: Jostar Yang jostar_yang@accton.com.tw
Why I did it
Fix led drv because CPLD SPEC is updated.
Fix i2c bus order
How I did it
Fix led drv. Set blacklist to i801 and ismt. Let accton util to modprobe i801 and ismt.
How to verify it
Test led and sensors cmd. Results are fine.
Why I did it
To fix I2C bus order to meet with HW SPEC. Let i801 use bus-0 and ismt use bus-1
How I did it
Modprobe i801 and then do ismt. So i801 will use bus-0 and ismt will use bus-1.
How to verify it
Test show cmd and sensors work well
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
As per linux kernel 5.10, 'force_deselct_on_exit' parameter used for driver i2c_mux_pca954x is no longer valid. Instead an attribute 'idle_state' is added per MUX device. This needs to be set to
-1 : For leaving the mux state as is
-2 : For deselecting the channel upon exit
: To always set a channel upon exit
This needs to be accommodated inside the PDDF JSON parser as well.
AS7816 support AT or non-AT DUT. They use different pmbus i2c bus. So use "pre_pddf_init.sh" to check this case.
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
- Why I did it
The latest upgrade of Mellanox hw-mgmt V7.0020.1300 introduced a couple new kernel modules for new Mellanox platforms that have yet to be upstreamed to the linux kernel.
As these new platforms do not have SONiC support we elected not to upstream these new drivers to sonic-linux-kernel but hw-mgmt expects them to exist which is causing a non-functional error on switch boot.
Feb 15 00:09:55.374130 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'emc2305'
Feb 15 00:09:55.374141 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'ads1015'
To resolve this we can patch hw-mgmt to no longer attempt to load these modules by default.
- How I did it
Added a SONiC patch to Mellanox hw-mgmt in order to remove the unused kernel modules which were not upstreamed to sonic-linux-kernel
- How to verify it
Boot switch and verify there are no error logs regarding kernel modules failing to load.
- Why I did it
In SONiC thermal control algorithm, it compares thermal zone temperature with thermal zone threshold. Previously, a thermal zone with no thermal sensor can still get its threshold. However, a recently driver patch changes this behavior: a thermal zone with no thermal sensor will return 0 for threshold. We need to ignore such thermal zone.
- How I did it
Ignore thermal zones whose temperature is 0.
- How to verify it
Added unit test case and Manual test
Fixes#10020
Why I did it
The platform api for parsing syseeprom information read from STATE DB has issue
with parsing the value part that has whitespace in the middle. The current
code assumes that the value part does not have whitespace. So everything after
the whitespace will be ignored. The syseeprom values returned from platform
API do not match the output of "show platform syseeprom".
How I did it
This change improved the regular expression for parsing syseeprom values to
accommodate whitespaces in the value.
How to verify it
Locally updated the code on a dx010 device. Call the platform API:
```
>>> import sonic_platform
>>> platform = sonic_platform.platform.Platform()
>>> chassis = platform.get_chassis()
>>> chassis.get_system_eeprom_info()
{'0x21': 'DX010', '0x22': 'R0872-F0020-02', '0x23': 'DX010B2F030A27BY200002', '0x24': '00:E0:EC:E7:71:0F', '0x25': '11/03/2020 21:22:56', '0x26': '3', '0x27': 'Seastone', '0x28': 'RANGELEY', '0x29': '2014.08', '0x2A': '131', '0x2B': 'CELESTICA', '0x2C': 'THA', '0x2D': 'Celestica', '0x2E': '1.0.5', '0x2F': 'LB', '0xFD': '', '0xFE': '0xAAB39BDB'}
```
Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Generate the sai.profile base on the brcm j2 file if the sai.profile
is not existing in the dut mounted folder.
Change the supervisor service configuration accordingly.
Testing done:
Add the script and config in dut
saiservice server can start automatically with [systemctl start saiserver]
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Add support for Accton wedge100bf_32qs platform
This pull request is based on wedge100bf_32x.
The components on the mainboard are the same as wedge100bf_32x, except for tofino 32Q and COMe models, so it refers to wedge100bf_32x to create new model: wedge100bf_32qs.
Signed-off-by: alvin_feng <alvin_feng@accton.com>
* Fix lgtm alerts issues
Signed-off-by: alvin_feng <alvin_feng@accton.com>
* Modify some file permissions and use symlink to link wedge100bf-32qs/sonic_platform
Signed-off-by: alvin_feng <alvin_feng@accton.com>
* Remove switch-sai.conf file
Signed-off-by: alvin_feng <alvin_feng@accton.com>
* Modify platform.json to avoid platform TCs issues and changes for correct generating BUFFER_QUEUE values in DB.
Signed-off-by: alvin_feng <alvin_feng@accton.com>
* Fix error name in platform.json
* [PTF-SAIv2]Add ptf dockre for sai-ptf (saiv2)
Base on current ptf docker create a new docker for sai-ptf(saiv2)
upgrade related package
use the latest ptf and install it
test done:
NOJESSIE=1 NOSTRETCH=1 NOBULLSEYE=1 ENABLE_SYNCD_RPC=y make target/docker-ptf-sai.gz
BLDENV=buster make -f Makefile.work target/docker-ptf-sai.gz
* upgrade the thrift to 014
#### Why I did it
To bump the Thrift version to 0.14.1
- To avoid [CVE-2020-13949](https://nvd.nist.gov/vuln/detail/CVE-2020-13949)
- to fix some dependencies issues
#### How I did it
- rename `src/thrfit_0_13_0` to `src/thrift_2` to remove version number in the path. (`src/thrift` contains rules to build thrift 0.11.0 )
- Add thrift sources as submodule as there are no prepared debian packages for version >0.13.0 on [debian.org](https://packages.debian.org/search?searchon=sourcenames&keywords=thrift)
- Added patches with fixes for original thrift debian rules:(remove unneeded packages, fix multi job build)
#### How to verify it
```
BLDENV=buster make -f Makefile.work target/debs/buster/libthrift-dev_0.14.1_amd64.deb
```
- Why I did it
Update MFT to version 4.18.1-16 for bugs fixes and new SN2201 support
- How I did it
Advance to MFT tool version to 4.18.1-16
- How to verify it
Manually tested on all Mellanox platforms (ASIC FW Upgrade, link debug tools, CPLD upgrade, etc.)
ded0344 Return both 'vendor_rev' and 'hardware_rev' keys from get_transceiver_info to support both earlier and later versions of xcvrd
b3442cc Change log_error to log_info when transceiver module is transitioned
3d3a73c Fix problem introduced with new SFP caching paradigm
2915746 Add script to reboot all IMMs
17ad221 Fix the position_in_parent for the psu entity
207e731 No longer force read pages at SFP init time
b387921 Fixed the voltage and power with rounding to 2 digit decimals
Signed-off-by: mlok <marty.lok@nokia.com>
* [AS4630-54PE] Add to support PDDF
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
* Fix LGTM alerts
* Fix LGTM alerts
* Fix LGTM alerts
* Add post_device.sh to turn off stk led
* Add to support system_health
* Correct the wait and timeout mechanism for better CPU usage
* Add event.c to support port evt change
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
Support saiserver v2 with python3 and thrift 0.13.0
add variables to support the saiserverv2
build different thrift in saithrift depends on saiserver version
build differernt versions of saiserver
make the saiserver and saiserver docker with version number
test done:
build two different versions of sasiserver in local build environment
add saiserver to buster
Co-authored-by: richard.yu <richard.yu@microsoft.comwq>
* [AS7726-32X] Add to support mulit PSU in PDDF
Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
* Modify SN offset and include path
* Fix device_name to PSU2-EEPROM in PSU2
Co-authored-by: Jostar Yang <jostar_yang@accton.com.tw>
- Why I did it
New version of mellanox platform management code available adding support for new platforms and fixing bugs.
- How I did it
1. Updated the submodule
2. Updated makefile version references
3. Regenerated SONiC patches
* Remove tm and all dependancies
Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>
* Removed line connected with thermal_manager
Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com>
- Why I did it
Fix issue: 'sx_port_mapping_t' object has no attribute 'slot_id'. sx_port_mapping_t only has attribute slot.
- How I did it
Change slot_id to slot.
- How to verify it
Manual test
- Why I did it
Python select.select accept a optional timeout value in seconds, however, the value passes to it is a value in millisecond.
- How I did it
Transfer the value to millisecond.
- How to verify it
Manual test
- Why I did it
To include latest SDK fixes:
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds.
and to include SAI fixes \ changes:
1. Reduce verbosity for resource check vendor data not found
2. Fix metadata validation, check default value on conditions check
3. Add 100MB, 10MB to 2201 system
4. L3 VXLAN overlay ECMP
5. VXLAN srcport API implementation
6. Fix scheduler profile null (default values) when set on sub group scheduler group
7. Fix ACL binding restoration when port leaves a LAG
8. Fix route logic for set next hop/action and reference counter for ECMP overlay
- How I did it
1. Updated SDK/FW submodule and relevant makefiles with the required versions.
2. Update SAI submodule and relevant makefile with the required version.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.
How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
Why I did it
Old fan drv will be build fail under kernel 5.10. It get below error message.
/sonic/platform/broadcom/sonic-platform-modules-accton/as7312-54xs/modules/accton_as7312_54x_fan.c:483:5: error: implicit declarat ion of function 'set_fs'; did you mean 'sget_fc'? [-Werror=implicit-function-declaration]
set_fs(KERNEL_DS);
^~~~~~
sget_fc
How I did it
These code is old design and they are not needed currently. So remove them.
Signed-off-by: Jostar Yang <jostar_yang@accton.com>
Why I did it
Update Broadcom SAI to version 6.0.0.13, SDK 6.5.24, saibcm-modules to 6.5.24.gpl
How I did it
Brcm SAI 6.0 EA with fixes for CS00012203367, CS00012219613, CS00012213974, CS00012218290, CS00012217169, CS00012211718, CS00012213944, CS00012215529, CS00012218100, CS00012214196, CS00012212681, CS00012205138, CS00012208537, CS00012185316, CS00012208524, CS00012203367, CS00012197364.
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.
- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable
- How to verify it
1. Manual test
2. Regression
#### Why I did it
Build failed.
Due to the error message looks like build failed because `pyversions` utility was not found.
`pyversions` utility is a part of `python2-minimal` package and it wasn't installed.
#### How I did it
To avoid installing python2 just specify explicit python version `--with python3` and use build system for py3 `--buildsystem=pybuild`
#### How to verify it
Run build
* Add intel_iommu=off to installer.conf
* This solve flooding DMAR err msg: "handling fault status reg 2"
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* remove customized at24 driver
* Use kernel 5.10.46 upstream at24 driver directly. The ADDR16 issue on
old driver has gone.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* pin I2C-0/I2C-1 bus order
* otherwise, sometimes I2C-0/I2C-1 will be assigned to the undesired one.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* fix i2c bus num for fan driver
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* backward compatible with R0A/R0B HW
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* fix workdir for seastone2
Signed-off-by: Viktor Ekmark <viktor@ekmark.se>
* seastone2: Add I2C SFP definition for SFP1
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sfputil logic for SFP1
Earlier logic resulted in the name of SFP1 being SFP33 which is not
correct. The cannonical source is seastone2_fpga module and it calls it
SFP1, so ensure the logic does as well.
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sysfs paths for SFP1
Various changes that plumbs the correct port presence and DOM decoding
for the SFP1 port.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Co-authored-by: Christian Svensson <blue@cmd.nu>
Enable pca954x idle_disconnect to avoid possible I2C device address conflict.
How I did it
Change pca954x device_attr idle_state to -2 (MUX_IDLE_DISCONNECT).
How to verify it
Cat pca954x device_attr idle_state and confirm the value is -2.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
Signed-off-by: Roman Savchuk romanx.savchuk@intel.com
Why I did it
Update SDK with fixed for warmboot tofino2
Added platform interface extension
How I did it
Create new package for Y2, Y1, X2, X1 profiles
How to verify it
Run test suites from sonic-mgmt repo
The SAI credo libraries don't depend on that library. This saves about
36MB of disk storage space.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Add docker-syncd-centec-rpc and docker-saiserver-centec to buster docker image list. These 2 docker images are based on docker-config-engine-buster.
Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Why I did it
Recently additional sensors that were needed only for specific system added to all systems and caused errors.
How I did it
* Include CPU board and switch board sensors only on SN2201 system
* Fix issue in test_chassis_thermal, now it skips non existing thermals.
How to verify it
Run show platform temperature
Signed-off-by: liora <liora@nvidia.com>
Why I did it
Reboot cause is not working in S52xx platforms.
How I did it
Modified platform API's.
How to verify it
Check "show reboot-cause" to verify the reboot reason
Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.
S6100, S6000:
Implement 'get_revision' method for Chassis
Implement 'get_maximum_consumed_power' method for FanDrawer
Implement 'get_revision', 'get_maximum_supplied_power' methods for PSU
Implement 'get_error_description' method for SFP.
S6100:
Implement 'get_module_index' method for Chassis
Implement 'get_description', 'get_maximum_consumed_power', 'get_oper_status', 'get_slot' methods for Module
Update component names in platform.json
How I did it
Implement the platform API methods in the respective device files
How to verify it
Verified that the respective sonic-mgmt platform API test cases report success.
Why I did it
Some platforms need to run few steps before the PDDF service is actually started.
* Adding pre_pddf_init script in the service file
* Raising exception for get_target_speed() for PSU-fan in PDDF (#8129)
- Why I did it
PDDF utils were python2 compliant and they needed to be migrated to Python3 (as per Bullseye)
PDDF common platform APIs file name changed as the name was already in use
Indentation issues
Dead/redundant code needed to be removed
- How I did it
Made files Python3 compliant
Indentation corrected
Redundant code removed
- How to verify it
AS7326 Accton platform uses PDDF. PDDF utils were run on this platform to verify.
- Why I did it
There were compilation errors and warnings like,
/usr/src/linux-headers-5.10.0-8-2-common/scripts/Makefile.build:69: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
fatal error: linux/platform_data/pca954x.h: No such file or directory
hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
If PDDF kernel module compilation fails, the PDDF debian package was not detecting the break.
- How I did it
Modified the code with new kernel 5.10 APIs.
Modified the Makefiles to use 'obj-m' instead of 'subdir-y'
- How to verify it
PDDF is supported on Accton platform. Load the build on AS7326 setup and check the 'dmesg'
- Why I did it
Added missing functionality for dynamic buffer calculation in Spectrum-4.
- How I did it
Added a section of code in asic_table.j2 for Spectrum-4, and added the simx version of SN5600 to the supported list.
- How to verify it
Manually: buffershow -l should show all ingress/egress lossy/lossless pools, and all fields of profiles should show values.
Automatically: https://github.com/Azure/sonic-mgmt/blob/master/tests/qos/test_buffer.py
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
Why I did it
Adding platform support for centec v682-48y8c and v682-48x8c.
V682-48y8c switch has 48 SFP+ (1G/10G/25G) ports, 8 QSFP28 (40G/100G) ports on CENTEC TsingMa.MX.
V682-48y8c is different from V682-48y8c_d in that:
transceiver is managed by cpu smbus rather than TsingMa.MX i2c bus.
port led is managed by mcu inside TsingMa.MX.
fan, psu, sensors, leds are managed by cpu smbus other than the cpu board vendor's close sourse driver.
V682-48x8c switch has 48 SFP+ (1G/10G) ports, 8 QSFP28 (40G/100G) ports on CENTEC TsingMa.MX.
CPU used in v682-48y8c and v682-48x8c is Intel(R) Xeon(R) CPU D-1527.
How I did it
Modify related code in platform and device directory.
Upgrade centec sai to v1.9.
upgrade python to python3 and kernel version to 5.0 for V682-48y8c_d.
How to verify it
Build centec amd64 sonic image, verify platform functions (port, sfp, led etc) on centec v682-48y8c and v682-48x8c board.
Co-authored-by: shil <shil@centecnetworks.com>
- Why I did it
To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply.
- How I did it
Removed obsolete patch, made applying patches a hard failure in the build.
- How to verify it
Run the make and verify patches are applied.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Rename platform x86_64-mlnx_msn4800 to x86_64-nvidia_msn4800
- How I did it
Rename platform folder as well as all code that reference the platform name
- How to verify it
Manual test
- Why I did it
To have an ability to use PRM sniffer.
- How I did it
Enabled the option in configure flags.
- How to verify it
Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
The same docker image is built multiple times after upgrading to bullseye, the build time is increased to about 15 hours from 6 hours.
See log: https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/50390/logs/9
Line 1437: 2021-11-11T11:15:02.7094923Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 1446: 2021-11-11T11:37:41.1073304Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 1459: 2021-11-11T11:38:20.6293007Z [ building ] [ target/docker-sonic-telemetry.gz-load ]
Line 1462: 2021-11-11T11:38:28.1250201Z [ finished ] [ target/docker-sonic-telemetry.gz-load ]
Line 2906: 2021-11-11T18:57:42.8207365Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 2917: 2021-11-11T19:43:47.1860961Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 3997: 2021-11-11T22:49:35.0196252Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 4002: 2021-11-11T23:14:00.4127728Z [ finished ] [ target/docker-sonic-telemetry.gz ]
How I did it
Place the python wheels in another folder relative to the build distribution.
Co-authored-by: Ubuntu <xumia@xumia-vm1.jqzc3g5pdlluxln0vevsg3s20h.xx.internal.cloudapp.net>
- Why I did it
Add new Spectrum-4 system support SN5600 on top of Nvidia ASIC simulator.
- How I did it
Add all relevant system and simulator SKU.
Updated syseeprom.hex and related directories to reflect Nvidia SN5600 brand name.
- How to verify it
Tested init flow, basic show commands, up interfaces, traffic test.
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
- Why I did it
To include latest fixes.
SAI
1. Reclaim buffers for port which is admin down
2. Support for Spectrum-4 os Nvidia ASIC simulation
3. Support for SN2201
4. Fix host interface table entry, one channel per trap (fix sflow double registration)
5. 2 new queue counters - ecn marked packets + shared current occupancy
6. Fix storm policer unknown unicast
7. Add key/value for accuflow counters
8. Add MAC move
9. Add mirror congestion mode attribute
SDK
1. Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
2. In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
3. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
6. Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
7. Tying the SCL and SDA of the optical modules to 3.3V causes errors.
8. On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
9. While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
10. In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY.
11. The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
12. When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
13. Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
14. Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason.
15. Fixed a memory leak in sx_api_port_sflow_statistics_get API.
16. During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.
17. Fix route issue on Kernel 5.10
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Update Nokia platform sonic-pmon submoduel to the latest with the following commits
c41c823 Fix transceiver module dynamic insertion/removal operations
21a1df6 Fixed pcied process FATAl issue
7fc1fd4 Fix midplane status nokia_cmd
a14ee1c Override get_module() api in chassis
8a457fc SON-326: Watchdog logger changes and file scrubbing
7250eb1 SON-410: Fix missing eeprom access routine
7a70c42 Allow only reboot of self card for OC API test
6ab5d96 Fixed the flake8 compliant issues
807de95 APIs to set thermal threshold to return false
9b38265 SON-382: platform-dump with common techsupport
3f83a67 Add model, base_mac, system_eeprom and serial number support in moduel.py
848d311 SFP: Add get_error_description and fix return status for set_lpmode
1fcb5de PSU check presence of psu instance for APIs
7c68da3 Fixed the eagle and hornet card description
0c01d07 Module support for reboot API
Why I did it
Nvidia platform API does not support set LED to orange
How I did it
Allow user to set LED to orange
How to verify it
Added unit test
Manual test
- Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
- Add support for `Woodleaf` product
- Move `libsfp-eeprom.so` to a different `.deb` package
- Add new logrotate configuration for arista logs
- Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
- Initialize chassis cards in parallel
- Refactor of `get_change_event` to fix interrupts treated as presence change
Why I did it
To implement fan control using thermalctld in DellEMC S6000 platform
Requires: Azure/sonic-linux-kernel#241
How I did it
Add thermal policies in 'thermal_policy.json'
Implemented thermal_manager.py and the necessary modules to perform fan control via thermalctld
Removed fancontrol.sh
How to verify it
Verified that the fan speeds are set based on the fan and temperature status.
Logs: S6000_fan_control_test_logs.txt