- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.
- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable
- How to verify it
1. Manual test
2. Regression
#### Why I did it
Build failed.
Due to the error message looks like build failed because `pyversions` utility was not found.
`pyversions` utility is a part of `python2-minimal` package and it wasn't installed.
#### How I did it
To avoid installing python2 just specify explicit python version `--with python3` and use build system for py3 `--buildsystem=pybuild`
#### How to verify it
Run build
* Add intel_iommu=off to installer.conf
* This solve flooding DMAR err msg: "handling fault status reg 2"
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* remove customized at24 driver
* Use kernel 5.10.46 upstream at24 driver directly. The ADDR16 issue on
old driver has gone.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* pin I2C-0/I2C-1 bus order
* otherwise, sometimes I2C-0/I2C-1 will be assigned to the undesired one.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* fix i2c bus num for fan driver
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* backward compatible with R0A/R0B HW
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
* fix workdir for seastone2
Signed-off-by: Viktor Ekmark <viktor@ekmark.se>
* seastone2: Add I2C SFP definition for SFP1
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sfputil logic for SFP1
Earlier logic resulted in the name of SFP1 being SFP33 which is not
correct. The cannonical source is seastone2_fpga module and it calls it
SFP1, so ensure the logic does as well.
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sysfs paths for SFP1
Various changes that plumbs the correct port presence and DOM decoding
for the SFP1 port.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Co-authored-by: Christian Svensson <blue@cmd.nu>
Enable pca954x idle_disconnect to avoid possible I2C device address conflict.
How I did it
Change pca954x device_attr idle_state to -2 (MUX_IDLE_DISCONNECT).
How to verify it
Cat pca954x device_attr idle_state and confirm the value is -2.
Signed-off-by: Sean Wu <sean_wu@edge-core.com>
Signed-off-by: Roman Savchuk romanx.savchuk@intel.com
Why I did it
Update SDK with fixed for warmboot tofino2
Added platform interface extension
How I did it
Create new package for Y2, Y1, X2, X1 profiles
How to verify it
Run test suites from sonic-mgmt repo
The SAI credo libraries don't depend on that library. This saves about
36MB of disk storage space.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Add docker-syncd-centec-rpc and docker-saiserver-centec to buster docker image list. These 2 docker images are based on docker-config-engine-buster.
Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Why I did it
Recently additional sensors that were needed only for specific system added to all systems and caused errors.
How I did it
* Include CPU board and switch board sensors only on SN2201 system
* Fix issue in test_chassis_thermal, now it skips non existing thermals.
How to verify it
Run show platform temperature
Signed-off-by: liora <liora@nvidia.com>
Why I did it
Reboot cause is not working in S52xx platforms.
How I did it
Modified platform API's.
How to verify it
Check "show reboot-cause" to verify the reboot reason
Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.
S6100, S6000:
Implement 'get_revision' method for Chassis
Implement 'get_maximum_consumed_power' method for FanDrawer
Implement 'get_revision', 'get_maximum_supplied_power' methods for PSU
Implement 'get_error_description' method for SFP.
S6100:
Implement 'get_module_index' method for Chassis
Implement 'get_description', 'get_maximum_consumed_power', 'get_oper_status', 'get_slot' methods for Module
Update component names in platform.json
How I did it
Implement the platform API methods in the respective device files
How to verify it
Verified that the respective sonic-mgmt platform API test cases report success.
Why I did it
Some platforms need to run few steps before the PDDF service is actually started.
* Adding pre_pddf_init script in the service file
* Raising exception for get_target_speed() for PSU-fan in PDDF (#8129)
- Why I did it
PDDF utils were python2 compliant and they needed to be migrated to Python3 (as per Bullseye)
PDDF common platform APIs file name changed as the name was already in use
Indentation issues
Dead/redundant code needed to be removed
- How I did it
Made files Python3 compliant
Indentation corrected
Redundant code removed
- How to verify it
AS7326 Accton platform uses PDDF. PDDF utils were run on this platform to verify.
- Why I did it
There were compilation errors and warnings like,
/usr/src/linux-headers-5.10.0-8-2-common/scripts/Makefile.build:69: You cannot use subdir-y/m to visit a module Makefile. Use obj-y/m instead.
fatal error: linux/platform_data/pca954x.h: No such file or directory
hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
If PDDF kernel module compilation fails, the PDDF debian package was not detecting the break.
- How I did it
Modified the code with new kernel 5.10 APIs.
Modified the Makefiles to use 'obj-m' instead of 'subdir-y'
- How to verify it
PDDF is supported on Accton platform. Load the build on AS7326 setup and check the 'dmesg'
- Why I did it
Added missing functionality for dynamic buffer calculation in Spectrum-4.
- How I did it
Added a section of code in asic_table.j2 for Spectrum-4, and added the simx version of SN5600 to the supported list.
- How to verify it
Manually: buffershow -l should show all ingress/egress lossy/lossless pools, and all fields of profiles should show values.
Automatically: https://github.com/Azure/sonic-mgmt/blob/master/tests/qos/test_buffer.py
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
Why I did it
Adding platform support for centec v682-48y8c and v682-48x8c.
V682-48y8c switch has 48 SFP+ (1G/10G/25G) ports, 8 QSFP28 (40G/100G) ports on CENTEC TsingMa.MX.
V682-48y8c is different from V682-48y8c_d in that:
transceiver is managed by cpu smbus rather than TsingMa.MX i2c bus.
port led is managed by mcu inside TsingMa.MX.
fan, psu, sensors, leds are managed by cpu smbus other than the cpu board vendor's close sourse driver.
V682-48x8c switch has 48 SFP+ (1G/10G) ports, 8 QSFP28 (40G/100G) ports on CENTEC TsingMa.MX.
CPU used in v682-48y8c and v682-48x8c is Intel(R) Xeon(R) CPU D-1527.
How I did it
Modify related code in platform and device directory.
Upgrade centec sai to v1.9.
upgrade python to python3 and kernel version to 5.0 for V682-48y8c_d.
How to verify it
Build centec amd64 sonic image, verify platform functions (port, sfp, led etc) on centec v682-48y8c and v682-48x8c board.
Co-authored-by: shil <shil@centecnetworks.com>
- Why I did it
To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply.
- How I did it
Removed obsolete patch, made applying patches a hard failure in the build.
- How to verify it
Run the make and verify patches are applied.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Rename platform x86_64-mlnx_msn4800 to x86_64-nvidia_msn4800
- How I did it
Rename platform folder as well as all code that reference the platform name
- How to verify it
Manual test
- Why I did it
To have an ability to use PRM sniffer.
- How I did it
Enabled the option in configure flags.
- How to verify it
Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
The same docker image is built multiple times after upgrading to bullseye, the build time is increased to about 15 hours from 6 hours.
See log: https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/50390/logs/9
Line 1437: 2021-11-11T11:15:02.7094923Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 1446: 2021-11-11T11:37:41.1073304Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 1459: 2021-11-11T11:38:20.6293007Z [ building ] [ target/docker-sonic-telemetry.gz-load ]
Line 1462: 2021-11-11T11:38:28.1250201Z [ finished ] [ target/docker-sonic-telemetry.gz-load ]
Line 2906: 2021-11-11T18:57:42.8207365Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 2917: 2021-11-11T19:43:47.1860961Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 3997: 2021-11-11T22:49:35.0196252Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 4002: 2021-11-11T23:14:00.4127728Z [ finished ] [ target/docker-sonic-telemetry.gz ]
How I did it
Place the python wheels in another folder relative to the build distribution.
Co-authored-by: Ubuntu <xumia@xumia-vm1.jqzc3g5pdlluxln0vevsg3s20h.xx.internal.cloudapp.net>
- Why I did it
Add new Spectrum-4 system support SN5600 on top of Nvidia ASIC simulator.
- How I did it
Add all relevant system and simulator SKU.
Updated syseeprom.hex and related directories to reflect Nvidia SN5600 brand name.
- How to verify it
Tested init flow, basic show commands, up interfaces, traffic test.
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
- Why I did it
To include latest fixes.
SAI
1. Reclaim buffers for port which is admin down
2. Support for Spectrum-4 os Nvidia ASIC simulation
3. Support for SN2201
4. Fix host interface table entry, one channel per trap (fix sflow double registration)
5. 2 new queue counters - ecn marked packets + shared current occupancy
6. Fix storm policer unknown unicast
7. Add key/value for accuflow counters
8. Add MAC move
9. Add mirror congestion mode attribute
SDK
1. Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
2. In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
3. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
6. Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
7. Tying the SCL and SDA of the optical modules to 3.3V causes errors.
8. On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
9. While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
10. In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY.
11. The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
12. When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
13. Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
14. Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason.
15. Fixed a memory leak in sx_api_port_sflow_statistics_get API.
16. During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.
17. Fix route issue on Kernel 5.10
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Update Nokia platform sonic-pmon submoduel to the latest with the following commits
c41c823 Fix transceiver module dynamic insertion/removal operations
21a1df6 Fixed pcied process FATAl issue
7fc1fd4 Fix midplane status nokia_cmd
a14ee1c Override get_module() api in chassis
8a457fc SON-326: Watchdog logger changes and file scrubbing
7250eb1 SON-410: Fix missing eeprom access routine
7a70c42 Allow only reboot of self card for OC API test
6ab5d96 Fixed the flake8 compliant issues
807de95 APIs to set thermal threshold to return false
9b38265 SON-382: platform-dump with common techsupport
3f83a67 Add model, base_mac, system_eeprom and serial number support in moduel.py
848d311 SFP: Add get_error_description and fix return status for set_lpmode
1fcb5de PSU check presence of psu instance for APIs
7c68da3 Fixed the eagle and hornet card description
0c01d07 Module support for reboot API
Why I did it
Nvidia platform API does not support set LED to orange
How I did it
Allow user to set LED to orange
How to verify it
Added unit test
Manual test
- Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
- Add support for `Woodleaf` product
- Move `libsfp-eeprom.so` to a different `.deb` package
- Add new logrotate configuration for arista logs
- Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
- Initialize chassis cards in parallel
- Refactor of `get_change_event` to fix interrupts treated as presence change
Why I did it
To implement fan control using thermalctld in DellEMC S6000 platform
Requires: Azure/sonic-linux-kernel#241
How I did it
Add thermal policies in 'thermal_policy.json'
Implemented thermal_manager.py and the necessary modules to perform fan control via thermalctld
Removed fancontrol.sh
How to verify it
Verified that the fan speeds are set based on the fan and temperature status.
Logs: S6000_fan_control_test_logs.txt
Why I did it
Adding SSD as part of platform components list.
Introducing platform_fw_au_reboot_handle to use auto-update functionality in fwutil
How I did it
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/platform_fw_au_reboot_handle
modified: platform/broadcom/sonic-platform-modules-dell/s6100/sonic_platform/chassis.py
modified: platform/broadcom/sonic-platform-modules-dell/s6100/sonic_platform/component.py
How to verify it
By running fwutil command.
Warning: fwupdate_fwimage_dir: /var/platform/fwpackage/.
Chassis Module Component Firmware Version (Current/Available) Status
--------- -------- ----------- ------------------------------- ----------------------------- ------------------
S6100-ON BIOS S6100-BIOS-3.25.0.2-9-noRP2.bin 3.25.0.2-8 / 3.25.0.2-9 update is required
FPGA smf_firmware_upgrade.tar 2.4 / 2.4 up-to-date
CPLD cpld_firmware_upgrade.tar 4 / 4 up-to-date
SSD ssd_firmware_upgrade.tar S16425cG / S16425cG up-to-date
root@sonic:~#
Why I did it
To support iTCO watchdog using watchdog APIs.
How I did it
Implemented a new watchdog class WatchdogTCO for interfacing with iTCO watchdog.
Updated reboot cause determination logic.
How to verify it
Verified that the watchdog APIs' return values are as expected.
Logs: UT_logs.txt
- Why I did it
Add support for SN2201 platform
- How I did it
Add required content for SN2201 platform
Note: still missing kernel driver support for this system. Once all is upstream will be updated as well.
- How to verify it
Install and basic sanity tests including traffic.
Signed-off-by: liora liora@nvidia.com
Depends on #9358
Why I did it
Adjust LED logical according to hw-mgmt change.
How I did it
Add a trigger to set LED to blink.
How to verify it
Manual test
For broadcom sai, we only need to upgrade the version, not necessary the token part in the url.
Co-authored-by: Ubuntu <xumia@xumia-vm1.jqzc3g5pdlluxln0vevsg3s20h.xx.internal.cloudapp.net>
#### Why I did it
Updated hw-mgmt pointer to updated branch and to include new bugfixes. The hw-mgmt submodule was previously pointing to an orphaned commit which could not be fetched from github, this has now been resolved.
#### How I did it
Updated submodule pointer.
#### How to verify it
Clone down repository and update all submodules.
Signed-off-by: Stephen Sun stephens@nvidia.com
Why I did it
Support zero buffer profiles
Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file
Dependency: It depends on Azure/sonic-swss#1910 and submodule advancing PR once the former merged.
How I did it
Add buffer profiles and pool definition for zero buffer profiles
If the buffer model is static:
Apply normal buffer profiles to admin-up ports
Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
Apply normal buffer profiles to all ports
buffer manager will take care when a port is shut down
Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists
Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:
The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:
One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files
Update sonic-cfggen test accordingly:
Adjust example output file of JSON template for unit test
Add unit test in for Mellanox's new buffer templates.
How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
* Add macsec-xpn-support iproute2 in syncd
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Polish code
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Remove useless files
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Add self-compiled iproute2 to docker sonic vs
Signed-off-by: Ze Gan <ganze718@gmail.com>
* Enhance apt install for iproute2 dependencies
Signed-off-by: Ze Gan <ganze718@gmail.com>
Sonic master has moved to use SAI v1.9.1. For consistency, use the latest credo sai v0.7.2 package, built with
SAI header v1.9.1 too.
How I did it
Update credo sai url for v0.7.2
Add a debug tool crshell
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.
- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
- How I did it
When PSU is powered of, don't treat it as absent.
- How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt