Commit Graph

24 Commits

Author SHA1 Message Date
Vivek
787dd7221d [Mellanox] Upgrade HW-MGMT to 7.0030.2008 and update platform-api (#17134)
Why I did it
Add platform support for Debian 12 (Bookworm) on Mellanox Platform

How I did it
Update hw-management to v7.0030.2008
Deprecate the sfp_count == module_count approach in favour of asic init completion
Ref: Mellanox/hw-mgmt@bf4f593
Add xxd package to base image which is required by hw-management scripts
Add the non-upstream flag into linux kernel cache options
Update the thermalctl logic based on new sysfs attributes
Fix the integrate-mlnx-hw-mgmt script to not populate the arm64 Kconfig
How to verify it
Build kernel and run platform tests

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-11-21 18:53:15 -08:00
Kebo Liu
e286869b24
[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239)
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems

- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-06 11:32:08 +03:00
Junchao-Mellanox
7962a5c0fa
[Mellanox] add PSU fan direction support (#14508)
- Why I did it
Add PSU fan direction support

- How I did it
Implement fan.get_direction for PSU fan

- How to verify it
Manual test
Unit test
2023-05-15 21:34:54 +03:00
Mai Bui
648ca075c7
[device/mellanox] Mitigation for security vulnerability (#11877)
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first.
#### Why I did it
`subprocess.Popen()` and `subprocess.check_output()` is used with `shell=True`, which is very dangerous for shell injection.
#### How I did it
Disable `shell=True`, enable `shell=False`
#### How to verify it
Tested on DUT, compare and verify the output between the original behavior and the new changes' behavior.
[testresults.zip](https://github.com/sonic-net/sonic-buildimage/files/9550867/testresults.zip)
2022-10-06 17:51:31 -04:00
Dror Prital
8bc81206c5
[Mellanox] Update NVIDIA License header for files changed since 1.1.2022 (#10289)
- Why I did it
Update NVIDIA Copyright header to "mellanox" files which were changed since 1.1.2022

- How I did it
Update the copyright header

- How to verify it
Sanity tests and PR checkers.
2022-03-23 13:19:25 +02:00
Junchao-Mellanox
4ae504a813
[Mellanox] Optimize thermal control policies (#9452)
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.

- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable

- How to verify it
1. Manual test
2. Regression
2022-01-19 11:44:37 +02:00
Junchao-Mellanox
e8b4c2a1f4
[Mellanox] Refactor Mellanox platform API to support dynamic port configuration (#8422)
- Why I did it
* To support systems with dynamic port configuration
* Apply lazy initialization to faster the speed of loading platform API

- How I did it
* Add module.py to implement dynamic port configuration (aka line card model)
* Adjust chassis.py, platform.py, thermal.py, sfp.py to support dynamic port configuration
* Optimize existing code

- How to verify it
Platform regression on MSN4700, MSN3800 and MSN2700, 100% pass
Unit test covers all new changes.
2021-10-25 07:59:06 +03:00
Dror Prital
5356244e53
[Mellanox] Add NVIDIA Copyright header to "mellanox" files (#8799)
- Why I did it
Add NVIDIA Copyright header to "mellanox" files

- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files

- How to verify it
Sanity tests and PR checkers.
2021-10-17 19:03:02 +03:00
Junchao-Mellanox
ed64eb94d9
[Mellanox] Read PSU fan max/min speed per PSU (#8563)
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU

#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file

#### How to verify it
Manual test
2021-08-26 01:03:55 -07:00
Stephen Sun
80d01f2f9a
[Mellanox] Enhance Python3 support for platform API (#7410)
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API

- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly

- How to verify it
Manually test and run regression platform test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-06-15 17:57:48 +03:00
Stephen Sun
b2286a24dc
[Mellanox] Adopt single way to get fan direction for all ASIC types (#7386)
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419

#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-03 17:10:18 -07:00
Junchao-Mellanox
d9cdf9d14f
[Mellanox] Adjust PSU fan name to align with sysfs file name (#7490)
Change PSU fan name from psu_{psu_index}fan{fan_index} to psu{psu_index}_fan{fan_index}
2021-05-02 08:14:56 -07:00
Junchao-Mellanox
51c77b179f
[Mellanox] Add python3 support for Mellanox platform API (#6175)
python2 is end of life and SONiC is going to support python3. This PR is going to support:

1. Mellanox SONiC platform API python3 support
2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side
2020-12-11 10:51:31 -08:00
Junchao-Mellanox
b595a6eadf
[Mellanox] Implement new platform API for SONiC physical entity mib extension (#5645)
In order to support SONiC physical entity mib extension, a few new platform API are added to sonic-platform-common, this PR is to provide an mellanox platform implementation for those new APIs.
2020-11-16 18:56:03 -08:00
Junchao-Mellanox
7bee5093f1
[Mellanox] Support max/min speed for PSU fan (#5682)
As new hw-mgmt expose the sysfs for PSU fan max speed, we need support max/min speed for PSU fan in mellanox platform API.
2020-10-26 12:47:12 -07:00
shlomibitton
bbb91715a8
[Mellanox] Change fan tolerance to 50% (#5018)
Mellanox platforms fan tolerance should change to 50%

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-23 11:18:26 -07:00
madhanmellanox
2c830f4074
Modified SKU based utils to Platform based utils (#4786)
Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-06-21 12:15:23 -07:00
Junchao-Mellanox
f277d13cd6
[Mellanox] Adjust log level to avoid too many thermal logs (#4631)
* Trigger thermal action log only if thermal condition changes
* test file existence before read file content
* fix error for set psu fan speed
* Remove logs because it print too frequently
2020-05-26 10:45:25 -07:00
Junchao-Mellanox
5e6c20481d
[Mellanox] Enhancement for fan led management (#4437) 2020-05-13 10:01:32 -07:00
Junchao-Mellanox
c730f3e207
[Mellanox] thermal control enhancement for dynamic minimum fan speed and PSU fan speed policy (#4403) 2020-04-21 08:09:53 -07:00
Junchao-Mellanox
80bf061b37
[Mellanox] Fix thermal control bugs (#4298)
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules

Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
2020-03-25 10:54:07 -07:00
Junchao-Mellanox
be549db395
Add thermal control support for SONiC (#3949) 2020-03-09 10:41:10 -07:00
Stephen Sun
d5aa0d4382 [Mellanox]support led for fan/psu and fan's direction (#3795) 2019-12-04 11:40:42 -08:00
Kebo Liu
818ba436a9 [Mellanox] Implement new fan platform API (#2747) 2019-04-21 14:34:28 -07:00