Commit Graph

87 Commits

Author SHA1 Message Date
Junchao-Mellanox
bfae15fb83
[mellaonox]: No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default (#7556)
No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default. And removing this will faster thermalctld exit speed
2021-05-08 10:33:37 -07:00
Stephen Sun
9f0dce0313
[Mellanox] Optimize SFP modules initialization (#7537)
Originally, SFP modules were always accessed from platform daemons, and arbitrary SFP modules can be accessed in the daemon. So all SFP modules were initialized in one shot once one of the following chassis APIs called
- get_all_sfps
- get_sfp_numbers
- get_sfp

Recently, we noticed that SFP modules can also be accessed from CLI, eg. the latest refactor of `sfputil`.

In this case, only one SFP module is accessed in the chassis object's life cycle.
To initialize all SFP modules in one shot is waste of time and causes the CLI to take much more time to finish.
So we would like to optimize the initialization flow by introducing a two-phase initialization approach:
- Partial initialization, which means the `chassis._sfp_list` has been initialized with proper length and all elements being `None`
- Full initialization, which means all elements in `chassis._sfp_list` are created

If the relevant function is called,
- `get_sfp`, only partial initialization will be done, and then the specific SFP module is initialized.
- `get_all_sfps` or `get_num_sfps`, full initialization will be done, which means all SFP modules are initialized.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-06 10:14:48 -07:00
Alexander Allen
0bc0f98d48
[platform] Add serial number and model number to Mellanox PSU platform implementation (#7382)
#### Why I did it

We want to add the ability for the command `show platform psustatus` to show the serial number and part number of the PSU devices on Mellanox platforms. This will be useful for data-center management of field replaceable units (FRUs) on switches.

#### How I did it

I implemented the platform 2.0 functions `get_model()` and `get_serial()` for the PSU in the mellanox platform API by referencing the sysfs nodes provided by the [hw-management](https://github.com/Azure/sonic-buildimage/tree/master/platform/mellanox/hw-management) module.
2021-05-04 13:07:00 -07:00
Stephen Sun
b2286a24dc
[Mellanox] Adopt single way to get fan direction for all ASIC types (#7386)
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419

#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-03 17:10:18 -07:00
Junchao-Mellanox
d9cdf9d14f
[Mellanox] Adjust PSU fan name to align with sysfs file name (#7490)
Change PSU fan name from psu_{psu_index}fan{fan_index} to psu{psu_index}_fan{fan_index}
2021-05-02 08:14:56 -07:00
Stephen Sun
b3a283366c
Fix issue: exception occurred during chassis object being destroyed (#7446)
The following error message is observed during chassis object being destroyed

"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."

This error occurs when a chassis object is created and then destroyed in the Python shell.

- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed

- How to verify it
Manually test.
2021-04-29 11:33:39 +03:00
Junchao-Mellanox
93a54450d3
Fix issue: should not initialize led color in __init__ file as platform API will be called by multiple daemons (#7114)
- Why I did it
The existing Fan led and Psu led object initialize itself to green color in init method. However, there are multiple daemons calls sonic platform API and there could be a case that:

A PSU is removed from system
Reboot switch
psud detects that 1 PSU is missing and set PSU led to red
Other daemon just start up and call sonic platform API, the API set PSU led to green by call PsuLed.init
This PR is a partial fix for the issue. As we also need guarantee that the led is initialized with a correct value. I checked existing psud and thermalctld code. psud always initialize the PSU led color on boot up, thermalcltd need some changes to initialize led color on the first run

- How I did it
Remove the led color initialization code from FanLed.init and PsuLed.init

- How to verify it
Manual test
2021-03-25 14:28:33 +02:00
Junchao-Mellanox
8504c72f14
[Mellanox] Initialize PSU API on both host and docker side (#7016)
There was a change to replace platform utils with sonic platform API in psuutil. However, psu API is not initialized on host side. The PR is to fix it.
2021-03-15 12:43:18 -07:00
Junchao-Mellanox
7caa70d2d6
[Mellanox] Fixes issue: CLI sfputil does not work based on sonic platform API (#7018)
#### Why I did it

Recently, CLI sfputil replace the old sonic platform utils with sonic platform API. However, sonic platform API does not support SFP low power mode and reset related operation. The PR is to fix it.

The change to replace platform utils with sonic platform API was reverted on 202012, once this PR is merged, we can cherry-pick these two PRs to 202012 together.

#### How I did it

In low power mode and reset related operation, use "docker exec" if the command is running on host side.
2021-03-11 18:54:33 -08:00
Joe LeVeque
516ff8bfff
[Mellanox] Ensure concrete platform API classes call base class initializer (#6854)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-02-25 11:06:22 -08:00
DavidZagury
5aee92e56d
[Mellanox] Add support for SN4600 system (#6879)
- Why I did it
Add support for new 64x200G SN4600 systems

- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.

- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
2021-02-25 09:30:43 +02:00
Joe LeVeque
7ea0d9e27a
[sonic-platform-common] Update submodule (#6742)
Submodule commits included:

* src/sonic-platform-common 6ad0004...bd4dc03 (1):
  > [sonic_sfp/qsfp_dd.py] Update DOM capability method name to align with other drivers (#163)

Also align all calling function names to match.
2021-02-10 06:12:49 -08:00
Junchao-Mellanox
6d4c20efb1
Fix dynamic minimum fan table issue caused by python3 (#6690)
**- Why I did it**
After migrating to python3, the operator '/' always get a float result, but it gets integer result in python2. Need fix this in thermal_conditions.

**- How I did it**
1. cast float value to int
2. change the unit test case to cover this situation

**- How to verify it**
Manually test and regression test
2021-02-07 11:21:44 +02:00
Joe LeVeque
18f2c5cfdd
[platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 14:41:05 -08:00
Kebo Liu
1b2980540d
[mellanox][platform api] fix a missing import time module (#6458)
“time" module was missed to be imported and will cause an error when the branch hit.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 08:01:11 -08:00
Junchao-Mellanox
0a49edb68e
[Mellanox] Fix issue: need import initialize_sdk_handle in get_sdk_handle (#6435)
Found test_sfp.py failed due to use a method without importing it.
2021-01-13 09:42:04 -08:00
Kebo Liu
015b421e5e
[Mellanox] [platform API] Fix “local variable 'label_port' referenced before assignment” error (#6419)
In rare case can see that xcvrd failed due to "UnboundLocalError: local variable 'label_port' referenced before assignment"

Init "label_port" as None at the beginning of the function, to avoid the case that "label_port" not assigned.
2021-01-12 10:43:57 -08:00
shlomibitton
feb4b04cdc
[Mellanox] PSU led platform API fixes (#6213)
Return 'False' when unsupported led color is requested, preventing an exception.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-12-22 14:54:40 -08:00
Junchao-Mellanox
6348248138
[Mellanox] Add high threshold and high critical threshold support for gearbox (#6206)
- Why I did it

Add high threshold and high critical threshold support for gearbox

- How I did it

Read gearbox thermal related threshold from sysfs
2020-12-15 16:51:43 -08:00
Junchao-Mellanox
51c77b179f
[Mellanox] Add python3 support for Mellanox platform API (#6175)
python2 is end of life and SONiC is going to support python3. This PR is going to support:

1. Mellanox SONiC platform API python3 support
2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side
2020-12-11 10:51:31 -08:00
Junchao-Mellanox
63992583ca
[Mellanox] Remove eeprom cache file when first time init eeprom object (#6071)
EEPROM cache file is not refreshed after install a new ONIE version even if the eeprom data is updated. The current Eeprom class always try to read from the cache file when the file exists. The PR is aimed to fix it.
2020-12-01 10:44:44 -08:00
Joe LeVeque
7f4ab8fbd8
[sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926)
Submodule updates include the following commits:

* src/sonic-utilities 9dc58ea...f9eb739 (18):
  > Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260)
  > [generate_dump] Ignoring file/directory not found Errors (#1201)
  > Fixed porstat rate and util issues (#1140)
  > fix error: interface counters is mismatch after warm-reboot (#1099)
  > Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255)
  > [acl-loader] Make list sorting compliant with Python 3 (#1257)
  > Replace hard-coded fast-reboot with variable. And some typo corrections (#1254)
  > [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247)
  > Remove unnecessary conversions to list() and calls to dict.keys() (#1243)
  > Clean up LGTM alerts (#1239)
  > Add 'requests' as install dependency in setup.py (#1240)
  > Convert to Python 3 (#1128)
  > Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238)
  > [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233)
  > Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224)
  > [cli]: NAT show commands newline issue after migrated to Python3 (#1204)
  > [doc]: Update Command-Reference.md (#1231)
  > Added 'import sys' in feature.py file (#1232)

* src/sonic-py-swsssdk 9d9f0c6...1664be9 (2):
  > Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96)
  > FieldValueMap `contains`(`in`)  will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94)

- Also fix Python 3-related issues:
    - Use integer (floor) division in config_samples.py (sonic-config-engine)
    - Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform
    - Update all platform plugins to be compatible with both Python 2 and Python 3
    - Remove shebangs from plugins files which are not intended to be executable
    - Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict
    - Remove trailing whitespace from plugins files
2020-11-25 10:28:36 -08:00
Vadym Hlushko
503873056e
[Mellanox] SN4410 support (#5778)
Add support for Mellanox Spectrum-3 based 100GbE/400GbE 1U. 24 QSFP-DD28 and 8 QSFP-DD ports

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2020-11-24 10:43:48 -08:00
Junchao-Mellanox
b595a6eadf
[Mellanox] Implement new platform API for SONiC physical entity mib extension (#5645)
In order to support SONiC physical entity mib extension, a few new platform API are added to sonic-platform-common, this PR is to provide an mellanox platform implementation for those new APIs.
2020-11-16 18:56:03 -08:00
shlomibitton
fd9bd40188
[Mellanox] Fix for QSFP-DD channel status (#5900)
Wrong object init broke the API. Replace object to the correct type.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-11 11:08:15 -08:00
shlomibitton
bec01ae3bb
[Mellanox] Enhance QSFP-DD DOM information (#5776)
New driver support fetching additional pages from the cable EEPROM.
There are additional information to parse now: RX/TX power, TX bias, TX fault and RX LOS.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-10 14:36:22 -08:00
Junchao-Mellanox
7bee5093f1
[Mellanox] Support max/min speed for PSU fan (#5682)
As new hw-mgmt expose the sysfs for PSU fan max speed, we need support max/min speed for PSU fan in mellanox platform API.
2020-10-26 12:47:12 -07:00
Junchao-Mellanox
15c59e1d8c
[Mellanox] Re-initialize SFP object when detecting a new SFP insertion (#5695)
When detecting a new SFP insertion, read its SFP type and DOM capability from EEPROM again.

SFP object will be initialized to a certain type even if no SFP present. A case could be:

1. A SFP object is initialized to QSFP type by default when there is no SFP present
2. User insert a SFP with an adapter to this QSFP port
3. The SFP object fail to read EEPROM because it still treats itself as QSFP.

This PR fixes this issue.
2020-10-23 12:36:11 -07:00
Junchao-Mellanox
ca7a4a4e3a
[Mellanox] Fix issue: read data from eeprom should trim tail \0 (#5670)
Now we are reading base mac, product name from eeprom data, and the data read from eeprom contains multiple "\0" characters at the end, need trim them to make the string clean and display correct.
2020-10-20 22:08:06 -07:00
Kebo Liu
73f38f6ce9
[Mellanox] Optimize SFP Platform API implementation (#5476)
Each SFP object inside Chassis will open an SDK client, this is not necessary and SDK client can be shared between SFP objects.
2020-10-19 11:30:38 -07:00
Joe LeVeque
8011edc307
[platform] Remove references to deprecated get_serial_number() method in Chassis class (#5649)
The `get_serial_number()` method in the ChassisBase and ModuleBase classes was redundant, as the `get_serial()` method is inherited from the DeviceBase class. This method was removed from the base classes in sonic-platform-common and the submodule was updated in https://github.com/Azure/sonic-buildimage/pull/5625.

This PR aligns the existing vendor platform API implementations to remove the `get_serial_number()` methods and ensure the `get_serial()` methods are implemented, if they weren't previously.

Note that this PR does not modify the Dell platform API implementations, as this will be handled as part of https://github.com/Azure/sonic-buildimage/pull/5609
2020-10-17 22:00:14 -07:00
Junchao-Mellanox
e92061cde9
[Mellanox] Update dynamic minimum table for 4700, 3420 and 4600C (#5388)
Update dynamic minimum fan speed table according to data provided by thermal team.
2020-10-03 10:28:44 -07:00
Kebo Liu
40623681bb
[Mellanox] Fix truncated manufacture date returned from platform API (#5473)
The manufacture date returned from platform API was truncated, time is not included. Revise the regular expression used for matching.
2020-10-03 10:17:13 -07:00
Kebo Liu
0a19cb4de5
[Mellanox] Refactor platform API to remove dependency on database (#5468)
**- Why I did it**
- Platform API implementation using sonic-cfggen to get platform name and SKU name, which will fail when the database is not available.
- Chassis name is not correctly assigned, it shall be assigned with EEPROM TLV "Product Name", instead of SKU name  
- Chassis model is not implemented, it shall be assigned with EEPROM TLV "Part Number"

**- How I did it**

1. Chassis

> - Get platform name from /host/machine.conf
> - Remove get SKU name with sonic-cfggen
> - Get Chassis name and model from EEPROM TLV "Product Name" and "Part Number" 
> - Add function to return model

2. EEPROM

> - Add function to return product name and part number

3. Platform

> - Init EEPROM on the host side, so also can get the Chassis name model from EEPROM on the host side.
2020-09-26 11:20:43 -07:00
Kebo Liu
72ec212fa7
[Mellanox] Refactor SFP related platform API and plugins with new SDK API (#5326)
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.

Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
2020-09-11 13:23:23 -07:00
Kebo Liu
bf3c901c6c
[Mellanox] Update the sfp platform API to get the ext_specification_compliance with new way (#5123)
Update the platform API implementation with calling dedicated parse function which defined in the platform-common as defined by https://github.com/Azure/sonic-platform-common/pull/112
2020-08-13 19:17:01 -07:00
shlomibitton
995bd09486
Add support for 'Extended Specification Compliance' for QSFP cables parser (#5096)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-08-07 00:16:59 +03:00
Joe LeVeque
3b89e5d467
[Python] Migrate applications/scripts to import sonic-py-common package (#5043)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999

Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
2020-08-03 11:43:12 -07:00
Nazarii Hnydyn
5c67a3c31d
[Mellanox] Fix SN3700 platform string. (#5036)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-25 03:05:05 -07:00
shlomibitton
bbb91715a8
[Mellanox] Change fan tolerance to 50% (#5018)
Mellanox platforms fan tolerance should change to 50%

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-23 11:18:26 -07:00
Joe LeVeque
9905d9382d
[devices] Update SFP keys to align with new standard (#4975)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 13:03:50 -07:00
shlomibitton
545fe3ecd0
Add support for QSFP-DD cables on MLNX platform API (#4965)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-15 11:09:46 -07:00
Junchao-Mellanox
76d68ad1f5
[Mellanox] Add support for set/get system led status (#4829)
System health feature needs to set/get system led status

- Add a led object in chassis class and initialize it when the API is called on host side
- Read/write system led system fs to get/set the status
2020-07-13 10:22:39 -07:00
Junchao-Mellanox
ce391645f2
[Mellanox] add ASIC temperature support to platform API (#4828)
**- Why I did it**

System health feature requires to read ASIC temperature and threshold from platform API

**- How I did it**

Implement Chassis.get_asic_temperature and Chassis.get_asic_temperature_threshold by getting value from system fs.
2020-06-28 17:54:28 -07:00
Junchao-Mellanox
563a0fd21e
[Mellanox] Change port index in port_config.ini to 1-based (#4781)
* Change port index in port_config.ini to 1-based
* Add default port index to port_config.ini, change platform plugins to accept 1-based port index
* fix port index in sfp_event.py
2020-06-23 17:21:36 -07:00
madhanmellanox
2c830f4074
Modified SKU based utils to Platform based utils (#4786)
Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-06-21 12:15:23 -07:00
Nazarii Hnydyn
1db64a3bc1
[Mellanox] Add ONIE and SSD platform components. (#4758)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-06-15 14:25:49 +03:00
Junchao-Mellanox
e25c2d984f
[Mellanox] Never disable kernel thermal algorithm at real-time (#4638) 2020-05-26 10:46:29 -07:00
Junchao-Mellanox
f277d13cd6
[Mellanox] Adjust log level to avoid too many thermal logs (#4631)
* Trigger thermal action log only if thermal condition changes
* test file existence before read file content
* fix error for set psu fan speed
* Remove logs because it print too frequently
2020-05-26 10:45:25 -07:00
Junchao-Mellanox
5e6c20481d
[Mellanox] Enhancement for fan led management (#4437) 2020-05-13 10:01:32 -07:00