- Why I did it
hw-management renamed PSU temperature related sysfs:
psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.
- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold
- How to verify it
Manual test
sonic-mgmt Regression test
- Why I did it
Adjust the warning threshold implementation according to the latest algorithm update
- How I did it
Modify power warning and critical thresholds methods
- How to verify it
Unit test updated to cover the change
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Support per PSU slope value for PSU power threshold according to hardware team requirement
- How I did it
Pass the PSU number as a parameter when fetching the slope value of PSU.
- How to verify it
Running regression and manual test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Remove TODO comments which are no longer needed
- How I did it
Remove TODO comments which are no longer needed
- How to verify it
Only comment change
* Support power threshold
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* get_psu_power_warning_threshold => get_psu_power_warning_suppress_threshold
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Fix comments
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first.
#### Why I did it
`subprocess.Popen()` and `subprocess.check_output()` is used with `shell=True`, which is very dangerous for shell injection.
#### How I did it
Disable `shell=True`, enable `shell=False`
#### How to verify it
Tested on DUT, compare and verify the output between the original behavior and the new changes' behavior.
[testresults.zip](https://github.com/sonic-net/sonic-buildimage/files/9550867/testresults.zip)
- Why I did it
Add PSU input voltage and input current to mlnx platform api.
- How I did it
Implement 2 function of getting the psu voltage and psu current input:
Get the values from "power/psu{}_curr_in" , "power/psu{}_volt_in"
- How to verify it
Manual test.
Run sonic-mgmt regression
Signed-off-by: orfar1994 <orfar1994@gmail.com>
- Why I did it
InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB.
- How I did it
1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better
2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test
3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud
- How to verify it
Manual test.
Run sonic-mgmt regression
- Why I did it
There is a hardware bug that PSU voltage threshold sysfs returns incorrect value. The workaround is to call "sensor -s" to refresh it.
- How I did it
Call "sensor -s" when the threshold value is not incorrect and PSU is "DELTA 1100"
- How to verify it
Unit test and Manual test
- Why I did it
Update NVIDIA Copyright header to "mellanox" files which were changed since 1.1.2022
- How I did it
Update the copyright header
- How to verify it
Sanity tests and PR checkers.
- Why I did it
Fix issue: psu might use wrong voltage sysfs which causes invalid voltage value. The flow is like:
1. User power off a PSU
2. All sysfs files related to this PSU are removed
3. User did a reboot/config reload
4. PSU will use wrong sysfs as voltage node
- How I did it
Always try find an existing sysfs.
- How to verify it
Manual test
- Why I did it
Support PSU voltage high/low thresholds and power max threshold
1. Add thresholds support for voltage and power.
2. As thresholds are not supported on all platforms, we need to check the capability first and fetch thresholds only if it is supported.
- How I did it
- How to verify it
Run regression test and manual test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
* To support systems with dynamic port configuration
* Apply lazy initialization to faster the speed of loading platform API
- How I did it
* Add module.py to implement dynamic port configuration (aka line card model)
* Adjust chassis.py, platform.py, thermal.py, sfp.py to support dynamic port configuration
* Optimize existing code
- How to verify it
Platform regression on MSN4700, MSN3800 and MSN2700, 100% pass
Unit test covers all new changes.
- Why I did it
Add NVIDIA Copyright header to "mellanox" files
- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files
- How to verify it
Sanity tests and PR checkers.
- Why I did it
The methods get_model, get_serial, and get_revision have been implemented by reading relevant information from VPD and then recording the information into relevant fields.
However, there is no VPD data on platforms with fixed PSUs and relevant fields haven't been initialized, which causes the methods to throw exceptions. which in turn prevents psud from inserting fields into PSU table.
Eventually, this causes show platform psustatus doesn't output correct info.
- How I did it
Initialize those fields as N/A on systems with fixed PSUs.
- How to verify it
Manually test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API
- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly
- How to verify it
Manually test and run regression platform test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
This pull request allows calls to be made through the platform 2.0 API that retrieve the PSU and Chassis hardware revision on Mellanox platforms. Access to these values will aid customers in determining their hardware revisions for debugging and technical support. These values are intended to be eventually exposed through the CLI.
#### How I did it
For the PSU hardware revision I used the existing VPD function calls implemented in https://github.com/Azure/sonic-buildimage/pull/7382
For the Chassis hardware revision I parsed the SMBIOS / DMI type 2 information to retrieve the information.
#### Why I did it
We want to add the ability for the command `show platform psustatus` to show the serial number and part number of the PSU devices on Mellanox platforms. This will be useful for data-center management of field replaceable units (FRUs) on switches.
#### How I did it
I implemented the platform 2.0 functions `get_model()` and `get_serial()` for the PSU in the mellanox platform API by referencing the sysfs nodes provided by the [hw-management](https://github.com/Azure/sonic-buildimage/tree/master/platform/mellanox/hw-management) module.
- Why I did it
Add support for new 64x200G SN4600 systems
- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.
- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
In order to support SONiC physical entity mib extension, a few new platform API are added to sonic-platform-common, this PR is to provide an mellanox platform implementation for those new APIs.
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module
This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
* Trigger thermal action log only if thermal condition changes
* test file existence before read file content
* fix error for set psu fan speed
* Remove logs because it print too frequently
* New SKU support for MSN3420
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
Conflicts:
device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
* Add CPLD's
* Symlink fixes and semantics
* Adding new platform at end of lines
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
* add MSN4700 device files
* update ACS-MSN4700 sai profile
* update buffer pool size, headroom, sensor conf, port config and reboot scripts
* fix ident
* update sensor conf and buffer pool
* [sn4700] add sku 4700 to chassis.py
* [Mellanox-4700] Add 4700 info to psu and thermal platform API
* update buffer config file template to the latest.
update SAI profile to use 100G X 4lanes for now
update port_config.ini according to the SAI profile
* [Mellanox]Update the buffer configurations for 4700
* fix alignment in pg_profile_lookup.ini
* add platform components file for new sku
* Update device/mellanox/x86_64-mlnx_msn4700-r0/ACS-MSN4700/pg_profile_lookup.ini
Co-Authored-By: Nazarii Hnydyn <nazariig@mellanox.com>
* remove redundant line
* [Mellanox]Correct type, buffer size
Co-authored-by: Nazarii Hnydyn <nazariig@mellanox.com>
Co-authored-by: junchao <junchao@mellanox.com>
Co-authored-by: Stephen Sun <stephens@mellanox.com>
* support new platform api, thermal and psu part
for psu, all APIs are supported.
for thermal, we support
get_temperature,
get_high_threshold
for the thermal sensors of cpu core, cpu pack, psu and sfp module
and get_temperature for the ambient thermal sensors around the asic, port, fan, comex and board.
* 1. address review comments
2. improve the handling of PSU inserting/removal
3. tolerance diverse psu thermal sensor file name conventions
* 1. adjust thermal code according to the latest version of hw-management
2. check power_good_status rather than whether file existing ahead of reading voltage, current and power of PSU