Commit Graph

151 Commits

Author SHA1 Message Date
Junchao-Mellanox
552963ab0e
[Mellanox] Change thermal recover threshold from temp_trip_norm to temp_trip_high (#8792)
- Why I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high, so that thermal algorithm would set fan speed to minimum allowed earlier and save power.

- How I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high

- How to verify it
Manual test
2021-10-04 20:20:33 +03:00
Junchao-Mellanox
ed64eb94d9
[Mellanox] Read PSU fan max/min speed per PSU (#8563)
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU

#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file

#### How to verify it
Manual test
2021-08-26 01:03:55 -07:00
DavidZagury
d26307d80f
[Mellanox][Pcie] Fix issue on pcied with an id that contains only decimal digits was treated as a decimal number (#8309)
A device that contains only decimal digits was mistreated as a decimal integer resulting in failure to find it in the id to bus map.
2021-08-03 15:25:28 -07:00
DavidZagury
67781abb97
[Mellanox][pcied] Ignore bus on pcie.yaml for Mellanox switches (#8063)
Why I did it
BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available

How I did it
Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id.

How to verify it
Verify no errors or failures in pcied on different BIOS version with the same code base.
2021-07-26 08:43:42 -07:00
tomer-israel
950c24c5ae
[PMON] [Mellanox] fix syseepromd issue on simx (#8131)
Avoid initializing sfp/thermal/components/fan/psu/leds on simx and create vpd_info file on hw_management when we use mellanox simulator platform

- Why I did it
this is a fix for issue in mellanox simulator platforms. the syseepromd failed on the pmon docker. also "decode-syseeprom" failed also

- How I did it
before initializing thermal/components/fan/psu/leds --> check if we are running on simx
creating the vpd_info on the hw_management folder.

- How to verify it
check if syseepromd process was loaded properly on the pmon docker.
decode-syseeprom is working well without errors/warnings
2021-07-20 11:56:04 +03:00
tomer-israel
a328fd24c0
[WARM-REBOOT] fix issue of watchdog on simx when executing warm-reboot command (#8132)
- Why I did it
to prevent python exception error when executing warm-reboot command on mellanox simulator platform

- How I did it
return None on the watchdog python script on cases that watchdog file is not exist

- How to verify it
warm-reboot is running well without the python error. error message will appear on log on these cases.
in order to avoid this error message we can simulate the watchdog on mellanox simulator platform
2021-07-19 22:08:44 +03:00
Junchao-Mellanox
147bf240f0
[Mellanox] Add bitmap support for SFP error event (#7605)
#### Why I did it

Currently, SONiC use a single value to represent SFP error, however, multiple SFP errors could exist at the same time. This PR is aimed to support it

#### How I did it

Return bitmap instead of single value when a SFP event occurs

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-06-25 10:56:47 -07:00
Stephen Sun
fc61ec9dbf
[Mellanox] Return N/A for PSU's model, serial and revision on platforms with fixed PSU (#7927)
- Why I did it
The methods get_model, get_serial, and get_revision have been implemented by reading relevant information from VPD and then recording the information into relevant fields.
However, there is no VPD data on platforms with fixed PSUs and relevant fields haven't been initialized, which causes the methods to throw exceptions. which in turn prevents psud from inserting fields into PSU table.
Eventually, this causes show platform psustatus doesn't output correct info.

- How I did it
Initialize those fields as N/A on systems with fixed PSUs.

- How to verify it
Manually test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-06-23 20:41:28 +03:00
Junchao-Mellanox
f294096eb6
[Mellanox] Read EEPROM data from DB if possible (#7808)
- Why I did it
Remove EEPROM cache file and use DB instead

- How I did it
Read EEPROM data from DB if possible
If data is not ready in DB, read from hardware using a visitor pattern

- How to verify it
Manual test and regression
2021-06-20 17:58:11 +03:00
Alexander Allen
29601366ee
[Mellanox] Implement auto_firmware_update platform API for to support fwutil auto-update (#7721)
Why I did it
The Mellanox platform is required to support the fwutil auto-update feature defined here

This is to allow switches, when performing SONiC upgrades to choose whether to perform firmware upgrades that may interrupt the data plane through a cold boot.

How I did it
Two methods were added to the component implementations for mellanox.

In the base Component class we add a default function that chooses to skip the installation of any firmware unless the cold boot option is provided. This is because the Mellanox platform, by default, does not support installing firmware on ONIE, the CPLD, or the BIOS "on-the-fly".

In the ComponentSSD class we add a function that behaves similarly but uses the Mellanox specific SSD firmware upgrade tool to check if the current SSD supports being upgraded on the fly in order to decide whether to skip or perform the installation.

How to verify it
Unit tests are included with this PR. These test will run on build of target sonic-mellanox.bin

You may also perform fwutil auto-update ... commands after Azure/sonic-utilities#1242 is merged in.
2021-06-16 14:55:20 -07:00
Stephen Sun
80d01f2f9a
[Mellanox] Enhance Python3 support for platform API (#7410)
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API

- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly

- How to verify it
Manually test and run regression platform test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-06-15 17:57:48 +03:00
Kebo Liu
93534ce0e1
[Mellanox] Align PSU name convention returned from psu.get_name platform API (#7783)
Make PSU name returned from platform API aligned with the convention "PSU {X}" instead of "PSU{X}".
2021-06-04 09:40:23 -07:00
Junchao-Mellanox
3ea3a5c8c1
[Mellanox] clear fan from chassis._fan_list (#7682)
#### Why I did it

According to thermalctld hld, each fan must belong to a fan drawer, if the fan drawer does not physically exist, put fan into a virtual fan drawer. This PR is to clear fan from chassis._fan_list

#### How I did it

1. Don't put fan to chassis._fan_list
2. Always query fan from fan_drawer
2021-05-24 11:36:39 -07:00
Alexander Allen
6a9d1e584d
[Mellanox] Implement Hardware Revision Platform API Call for Mellanox Chassis and PSU (#7552)
#### Why I did it

This pull request allows calls to be made through the platform 2.0 API that retrieve the PSU and Chassis hardware revision on Mellanox platforms. Access to these values will aid customers in determining their hardware revisions for debugging and technical support. These values are intended to be eventually exposed through the CLI. 

#### How I did it

For the PSU hardware revision I used the existing VPD function calls implemented in https://github.com/Azure/sonic-buildimage/pull/7382

For the Chassis hardware revision I parsed the SMBIOS / DMI type 2 information to retrieve the information.
2021-05-24 09:37:59 -07:00
Junchao-Mellanox
bfae15fb83
[mellaonox]: No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default (#7556)
No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default. And removing this will faster thermalctld exit speed
2021-05-08 10:33:37 -07:00
Stephen Sun
9f0dce0313
[Mellanox] Optimize SFP modules initialization (#7537)
Originally, SFP modules were always accessed from platform daemons, and arbitrary SFP modules can be accessed in the daemon. So all SFP modules were initialized in one shot once one of the following chassis APIs called
- get_all_sfps
- get_sfp_numbers
- get_sfp

Recently, we noticed that SFP modules can also be accessed from CLI, eg. the latest refactor of `sfputil`.

In this case, only one SFP module is accessed in the chassis object's life cycle.
To initialize all SFP modules in one shot is waste of time and causes the CLI to take much more time to finish.
So we would like to optimize the initialization flow by introducing a two-phase initialization approach:
- Partial initialization, which means the `chassis._sfp_list` has been initialized with proper length and all elements being `None`
- Full initialization, which means all elements in `chassis._sfp_list` are created

If the relevant function is called,
- `get_sfp`, only partial initialization will be done, and then the specific SFP module is initialized.
- `get_all_sfps` or `get_num_sfps`, full initialization will be done, which means all SFP modules are initialized.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-06 10:14:48 -07:00
Alexander Allen
0bc0f98d48
[platform] Add serial number and model number to Mellanox PSU platform implementation (#7382)
#### Why I did it

We want to add the ability for the command `show platform psustatus` to show the serial number and part number of the PSU devices on Mellanox platforms. This will be useful for data-center management of field replaceable units (FRUs) on switches.

#### How I did it

I implemented the platform 2.0 functions `get_model()` and `get_serial()` for the PSU in the mellanox platform API by referencing the sysfs nodes provided by the [hw-management](https://github.com/Azure/sonic-buildimage/tree/master/platform/mellanox/hw-management) module.
2021-05-04 13:07:00 -07:00
Stephen Sun
b2286a24dc
[Mellanox] Adopt single way to get fan direction for all ASIC types (#7386)
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419

#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-03 17:10:18 -07:00
Junchao-Mellanox
d9cdf9d14f
[Mellanox] Adjust PSU fan name to align with sysfs file name (#7490)
Change PSU fan name from psu_{psu_index}fan{fan_index} to psu{psu_index}_fan{fan_index}
2021-05-02 08:14:56 -07:00
Stephen Sun
b3a283366c
Fix issue: exception occurred during chassis object being destroyed (#7446)
The following error message is observed during chassis object being destroyed

"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."

This error occurs when a chassis object is created and then destroyed in the Python shell.

- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed

- How to verify it
Manually test.
2021-04-29 11:33:39 +03:00
Junchao-Mellanox
93a54450d3
Fix issue: should not initialize led color in __init__ file as platform API will be called by multiple daemons (#7114)
- Why I did it
The existing Fan led and Psu led object initialize itself to green color in init method. However, there are multiple daemons calls sonic platform API and there could be a case that:

A PSU is removed from system
Reboot switch
psud detects that 1 PSU is missing and set PSU led to red
Other daemon just start up and call sonic platform API, the API set PSU led to green by call PsuLed.init
This PR is a partial fix for the issue. As we also need guarantee that the led is initialized with a correct value. I checked existing psud and thermalctld code. psud always initialize the PSU led color on boot up, thermalcltd need some changes to initialize led color on the first run

- How I did it
Remove the led color initialization code from FanLed.init and PsuLed.init

- How to verify it
Manual test
2021-03-25 14:28:33 +02:00
Junchao-Mellanox
8504c72f14
[Mellanox] Initialize PSU API on both host and docker side (#7016)
There was a change to replace platform utils with sonic platform API in psuutil. However, psu API is not initialized on host side. The PR is to fix it.
2021-03-15 12:43:18 -07:00
Junchao-Mellanox
7caa70d2d6
[Mellanox] Fixes issue: CLI sfputil does not work based on sonic platform API (#7018)
#### Why I did it

Recently, CLI sfputil replace the old sonic platform utils with sonic platform API. However, sonic platform API does not support SFP low power mode and reset related operation. The PR is to fix it.

The change to replace platform utils with sonic platform API was reverted on 202012, once this PR is merged, we can cherry-pick these two PRs to 202012 together.

#### How I did it

In low power mode and reset related operation, use "docker exec" if the command is running on host side.
2021-03-11 18:54:33 -08:00
Joe LeVeque
516ff8bfff
[Mellanox] Ensure concrete platform API classes call base class initializer (#6854)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-02-25 11:06:22 -08:00
DavidZagury
5aee92e56d
[Mellanox] Add support for SN4600 system (#6879)
- Why I did it
Add support for new 64x200G SN4600 systems

- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.

- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
2021-02-25 09:30:43 +02:00
Joe LeVeque
7ea0d9e27a
[sonic-platform-common] Update submodule (#6742)
Submodule commits included:

* src/sonic-platform-common 6ad0004...bd4dc03 (1):
  > [sonic_sfp/qsfp_dd.py] Update DOM capability method name to align with other drivers (#163)

Also align all calling function names to match.
2021-02-10 06:12:49 -08:00
Junchao-Mellanox
6d4c20efb1
Fix dynamic minimum fan table issue caused by python3 (#6690)
**- Why I did it**
After migrating to python3, the operator '/' always get a float result, but it gets integer result in python2. Need fix this in thermal_conditions.

**- How I did it**
1. cast float value to int
2. change the unit test case to cover this situation

**- How to verify it**
Manually test and regression test
2021-02-07 11:21:44 +02:00
Joe LeVeque
18f2c5cfdd
[platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 14:41:05 -08:00
Kebo Liu
1b2980540d
[mellanox][platform api] fix a missing import time module (#6458)
“time" module was missed to be imported and will cause an error when the branch hit.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 08:01:11 -08:00
Junchao-Mellanox
0a49edb68e
[Mellanox] Fix issue: need import initialize_sdk_handle in get_sdk_handle (#6435)
Found test_sfp.py failed due to use a method without importing it.
2021-01-13 09:42:04 -08:00
Kebo Liu
015b421e5e
[Mellanox] [platform API] Fix “local variable 'label_port' referenced before assignment” error (#6419)
In rare case can see that xcvrd failed due to "UnboundLocalError: local variable 'label_port' referenced before assignment"

Init "label_port" as None at the beginning of the function, to avoid the case that "label_port" not assigned.
2021-01-12 10:43:57 -08:00
shlomibitton
feb4b04cdc
[Mellanox] PSU led platform API fixes (#6213)
Return 'False' when unsupported led color is requested, preventing an exception.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-12-22 14:54:40 -08:00
Junchao-Mellanox
6348248138
[Mellanox] Add high threshold and high critical threshold support for gearbox (#6206)
- Why I did it

Add high threshold and high critical threshold support for gearbox

- How I did it

Read gearbox thermal related threshold from sysfs
2020-12-15 16:51:43 -08:00
Junchao-Mellanox
51c77b179f
[Mellanox] Add python3 support for Mellanox platform API (#6175)
python2 is end of life and SONiC is going to support python3. This PR is going to support:

1. Mellanox SONiC platform API python3 support
2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side
2020-12-11 10:51:31 -08:00
Junchao-Mellanox
63992583ca
[Mellanox] Remove eeprom cache file when first time init eeprom object (#6071)
EEPROM cache file is not refreshed after install a new ONIE version even if the eeprom data is updated. The current Eeprom class always try to read from the cache file when the file exists. The PR is aimed to fix it.
2020-12-01 10:44:44 -08:00
Joe LeVeque
7f4ab8fbd8
[sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926)
Submodule updates include the following commits:

* src/sonic-utilities 9dc58ea...f9eb739 (18):
  > Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260)
  > [generate_dump] Ignoring file/directory not found Errors (#1201)
  > Fixed porstat rate and util issues (#1140)
  > fix error: interface counters is mismatch after warm-reboot (#1099)
  > Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255)
  > [acl-loader] Make list sorting compliant with Python 3 (#1257)
  > Replace hard-coded fast-reboot with variable. And some typo corrections (#1254)
  > [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247)
  > Remove unnecessary conversions to list() and calls to dict.keys() (#1243)
  > Clean up LGTM alerts (#1239)
  > Add 'requests' as install dependency in setup.py (#1240)
  > Convert to Python 3 (#1128)
  > Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238)
  > [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233)
  > Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224)
  > [cli]: NAT show commands newline issue after migrated to Python3 (#1204)
  > [doc]: Update Command-Reference.md (#1231)
  > Added 'import sys' in feature.py file (#1232)

* src/sonic-py-swsssdk 9d9f0c6...1664be9 (2):
  > Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96)
  > FieldValueMap `contains`(`in`)  will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94)

- Also fix Python 3-related issues:
    - Use integer (floor) division in config_samples.py (sonic-config-engine)
    - Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform
    - Update all platform plugins to be compatible with both Python 2 and Python 3
    - Remove shebangs from plugins files which are not intended to be executable
    - Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict
    - Remove trailing whitespace from plugins files
2020-11-25 10:28:36 -08:00
Vadym Hlushko
503873056e
[Mellanox] SN4410 support (#5778)
Add support for Mellanox Spectrum-3 based 100GbE/400GbE 1U. 24 QSFP-DD28 and 8 QSFP-DD ports

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2020-11-24 10:43:48 -08:00
Junchao-Mellanox
b595a6eadf
[Mellanox] Implement new platform API for SONiC physical entity mib extension (#5645)
In order to support SONiC physical entity mib extension, a few new platform API are added to sonic-platform-common, this PR is to provide an mellanox platform implementation for those new APIs.
2020-11-16 18:56:03 -08:00
shlomibitton
fd9bd40188
[Mellanox] Fix for QSFP-DD channel status (#5900)
Wrong object init broke the API. Replace object to the correct type.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-11 11:08:15 -08:00
shlomibitton
bec01ae3bb
[Mellanox] Enhance QSFP-DD DOM information (#5776)
New driver support fetching additional pages from the cable EEPROM.
There are additional information to parse now: RX/TX power, TX bias, TX fault and RX LOS.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-10 14:36:22 -08:00
Junchao-Mellanox
7bee5093f1
[Mellanox] Support max/min speed for PSU fan (#5682)
As new hw-mgmt expose the sysfs for PSU fan max speed, we need support max/min speed for PSU fan in mellanox platform API.
2020-10-26 12:47:12 -07:00
Junchao-Mellanox
15c59e1d8c
[Mellanox] Re-initialize SFP object when detecting a new SFP insertion (#5695)
When detecting a new SFP insertion, read its SFP type and DOM capability from EEPROM again.

SFP object will be initialized to a certain type even if no SFP present. A case could be:

1. A SFP object is initialized to QSFP type by default when there is no SFP present
2. User insert a SFP with an adapter to this QSFP port
3. The SFP object fail to read EEPROM because it still treats itself as QSFP.

This PR fixes this issue.
2020-10-23 12:36:11 -07:00
Junchao-Mellanox
ca7a4a4e3a
[Mellanox] Fix issue: read data from eeprom should trim tail \0 (#5670)
Now we are reading base mac, product name from eeprom data, and the data read from eeprom contains multiple "\0" characters at the end, need trim them to make the string clean and display correct.
2020-10-20 22:08:06 -07:00
Kebo Liu
73f38f6ce9
[Mellanox] Optimize SFP Platform API implementation (#5476)
Each SFP object inside Chassis will open an SDK client, this is not necessary and SDK client can be shared between SFP objects.
2020-10-19 11:30:38 -07:00
Joe LeVeque
8011edc307
[platform] Remove references to deprecated get_serial_number() method in Chassis class (#5649)
The `get_serial_number()` method in the ChassisBase and ModuleBase classes was redundant, as the `get_serial()` method is inherited from the DeviceBase class. This method was removed from the base classes in sonic-platform-common and the submodule was updated in https://github.com/Azure/sonic-buildimage/pull/5625.

This PR aligns the existing vendor platform API implementations to remove the `get_serial_number()` methods and ensure the `get_serial()` methods are implemented, if they weren't previously.

Note that this PR does not modify the Dell platform API implementations, as this will be handled as part of https://github.com/Azure/sonic-buildimage/pull/5609
2020-10-17 22:00:14 -07:00
Junchao-Mellanox
e92061cde9
[Mellanox] Update dynamic minimum table for 4700, 3420 and 4600C (#5388)
Update dynamic minimum fan speed table according to data provided by thermal team.
2020-10-03 10:28:44 -07:00
Kebo Liu
40623681bb
[Mellanox] Fix truncated manufacture date returned from platform API (#5473)
The manufacture date returned from platform API was truncated, time is not included. Revise the regular expression used for matching.
2020-10-03 10:17:13 -07:00
Kebo Liu
0a19cb4de5
[Mellanox] Refactor platform API to remove dependency on database (#5468)
**- Why I did it**
- Platform API implementation using sonic-cfggen to get platform name and SKU name, which will fail when the database is not available.
- Chassis name is not correctly assigned, it shall be assigned with EEPROM TLV "Product Name", instead of SKU name  
- Chassis model is not implemented, it shall be assigned with EEPROM TLV "Part Number"

**- How I did it**

1. Chassis

> - Get platform name from /host/machine.conf
> - Remove get SKU name with sonic-cfggen
> - Get Chassis name and model from EEPROM TLV "Product Name" and "Part Number" 
> - Add function to return model

2. EEPROM

> - Add function to return product name and part number

3. Platform

> - Init EEPROM on the host side, so also can get the Chassis name model from EEPROM on the host side.
2020-09-26 11:20:43 -07:00
Kebo Liu
72ec212fa7
[Mellanox] Refactor SFP related platform API and plugins with new SDK API (#5326)
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.

Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
2020-09-11 13:23:23 -07:00
Kebo Liu
bf3c901c6c
[Mellanox] Update the sfp platform API to get the ext_specification_compliance with new way (#5123)
Update the platform API implementation with calling dedicated parse function which defined in the platform-common as defined by https://github.com/Azure/sonic-platform-common/pull/112
2020-08-13 19:17:01 -07:00
shlomibitton
995bd09486
Add support for 'Extended Specification Compliance' for QSFP cables parser (#5096)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-08-07 00:16:59 +03:00
Joe LeVeque
3b89e5d467
[Python] Migrate applications/scripts to import sonic-py-common package (#5043)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999

Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
2020-08-03 11:43:12 -07:00
Nazarii Hnydyn
5c67a3c31d
[Mellanox] Fix SN3700 platform string. (#5036)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-25 03:05:05 -07:00
shlomibitton
bbb91715a8
[Mellanox] Change fan tolerance to 50% (#5018)
Mellanox platforms fan tolerance should change to 50%

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-23 11:18:26 -07:00
Joe LeVeque
9905d9382d
[devices] Update SFP keys to align with new standard (#4975)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 13:03:50 -07:00
shlomibitton
545fe3ecd0
Add support for QSFP-DD cables on MLNX platform API (#4965)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-15 11:09:46 -07:00
Junchao-Mellanox
76d68ad1f5
[Mellanox] Add support for set/get system led status (#4829)
System health feature needs to set/get system led status

- Add a led object in chassis class and initialize it when the API is called on host side
- Read/write system led system fs to get/set the status
2020-07-13 10:22:39 -07:00
Junchao-Mellanox
ce391645f2
[Mellanox] add ASIC temperature support to platform API (#4828)
**- Why I did it**

System health feature requires to read ASIC temperature and threshold from platform API

**- How I did it**

Implement Chassis.get_asic_temperature and Chassis.get_asic_temperature_threshold by getting value from system fs.
2020-06-28 17:54:28 -07:00
Junchao-Mellanox
563a0fd21e
[Mellanox] Change port index in port_config.ini to 1-based (#4781)
* Change port index in port_config.ini to 1-based
* Add default port index to port_config.ini, change platform plugins to accept 1-based port index
* fix port index in sfp_event.py
2020-06-23 17:21:36 -07:00
madhanmellanox
2c830f4074
Modified SKU based utils to Platform based utils (#4786)
Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-06-21 12:15:23 -07:00
Nazarii Hnydyn
1db64a3bc1
[Mellanox] Add ONIE and SSD platform components. (#4758)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-06-15 14:25:49 +03:00
Junchao-Mellanox
e25c2d984f
[Mellanox] Never disable kernel thermal algorithm at real-time (#4638) 2020-05-26 10:46:29 -07:00
Junchao-Mellanox
f277d13cd6
[Mellanox] Adjust log level to avoid too many thermal logs (#4631)
* Trigger thermal action log only if thermal condition changes
* test file existence before read file content
* fix error for set psu fan speed
* Remove logs because it print too frequently
2020-05-26 10:45:25 -07:00
Junchao-Mellanox
5e6c20481d
[Mellanox] Enhancement for fan led management (#4437) 2020-05-13 10:01:32 -07:00
Junchao-Mellanox
4c210f0d02
[Mellanox] Enhancement for support PSU LED management (#4467) 2020-04-30 12:42:01 -07:00
shlomibitton
b6291372d9
[Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4600c and new SKU ACS-MSN4600C (#4483)
* New SKU support for MSN4600C

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-04-30 00:30:11 -07:00
Nazarii Hnydyn
0409a32abe
[mellanox]: Align CPLD component with latest hw-mgmt. (#4485)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-04-28 18:15:19 +03:00
Junchao-Mellanox
b26814f643
[Mellanox] Adjust dynamic minimum fan speed algorithm (#4476)
* remove air flow direction from dynamic minimum algorithm
* adjust minimum table according to thermal data
2020-04-27 20:52:57 -07:00
shlomibitton
ac6cfb115f
[Mellanox] Add a new Mellanox platform x86_64-mlnx_msn3420 and new SKU ACS-MSN3420 (#4436)
* New SKU support for MSN3420

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>

Conflicts:
	device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py

* Add CPLD's

* Symlink fixes and semantics

* Adding new platform at end of lines
2020-04-26 14:39:55 +03:00
Junchao-Mellanox
c730f3e207
[Mellanox] thermal control enhancement for dynamic minimum fan speed and PSU fan speed policy (#4403) 2020-04-21 08:09:53 -07:00
Kebo Liu
cfa112ace8
[Mellanox] Extend mellanox platform API to report SFP error event (#4365)
* extend mellanox platform API to report SFP error event
* remove unnecessary loop code
* install enum34 to pmon to support using Enum
2020-04-14 10:20:06 -07:00
Nazarii Hnydyn
3c4f3116a0
[mellanox]: Enable CPLD update progress bar (#4363)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-04-14 09:55:08 -07:00
Nazarii Hnydyn
1b8897eec0
[mellanox]: Add SSD FW update tool (#4351)
* [mellanox]: Add SSD FW update tool.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Align Platform API.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Fix firmware description.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mellanox]: Update SSD tool.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-04-13 18:13:19 +03:00
Junchao-Mellanox
80bf061b37
[Mellanox] Fix thermal control bugs (#4298)
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules

Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
2020-03-25 10:54:07 -07:00
Kebo Liu
f4ed88297d
[Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901)
* add MSN4700 device files

* update ACS-MSN4700 sai profile

* update buffer pool size, headroom, sensor conf, port config and reboot scripts

* fix ident

* update sensor conf and buffer pool

* [sn4700] add sku 4700 to chassis.py

* [Mellanox-4700] Add 4700 info to psu and thermal platform API

* update buffer config file template to the latest.
update SAI profile to use 100G X 4lanes for now
update port_config.ini according to the SAI profile

* [Mellanox]Update the buffer configurations for 4700

* fix alignment in pg_profile_lookup.ini

* add platform components file for new sku

* Update device/mellanox/x86_64-mlnx_msn4700-r0/ACS-MSN4700/pg_profile_lookup.ini

Co-Authored-By: Nazarii Hnydyn <nazariig@mellanox.com>

* remove redundant line

* [Mellanox]Correct type, buffer size

Co-authored-by: Nazarii Hnydyn <nazariig@mellanox.com>
Co-authored-by: junchao <junchao@mellanox.com>
Co-authored-by: Stephen Sun <stephens@mellanox.com>
2020-03-24 14:32:52 +02:00
Nazarii Hnydyn
4d22cd405f
[mellanox]: Align platform API: change CPLD version representation (#4221) 2020-03-23 09:04:11 -07:00
Junchao-Mellanox
be549db395
Add thermal control support for SONiC (#3949) 2020-03-09 10:41:10 -07:00
Nazarii Hnydyn
fc101b6ceb
[mellanox]: Add new Mellanox-SN3800-D112C8 sku. (#4085)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-01-30 18:54:09 -08:00
Stephen Sun
33e918f7ff
[Mellanox] platform api support firmware install (#3931)
support firmware install, including CPLD and BIOS.

CPLD: cpldupdate
BIOS: boot to onie and update BIOS in onie and then boot to SONiC
2020-01-28 21:55:50 -08:00
Stephen Sun
1886bdf7ad [Mellanox] fix gearbox ambient thermal name (#4005) 2020-01-17 14:05:35 -08:00
Stephen Sun
90ddad48d1 [mellanox ]improve the method the type of sfp module is detected (#3846)
Fix the issue when an SFP module is plugged into a QSFP port via an adapter.

- How I did it
Originally the type of an SFP module is determined according to the SKU dictionary. However, it's possible that as SFP module is plugged into a QSFP port via an adapter. In this case, the EEPROM content will be parsed in the wrong format.
To address that we fetch the identifier value of an xSFP module and then get the type by parsing it.
2019-12-07 11:28:49 -08:00
Stephen Sun
d5aa0d4382 [Mellanox]support led for fan/psu and fan's direction (#3795) 2019-12-04 11:40:42 -08:00
Stephen Sun
0c9040dec9 [Mellanox] support get_transceiver_threshold_info (#3777)
* [sonic_platform.sfp] support get_transceiver_dom_threshold_info_dict

* [platform/sfp]qsfp threshold and beautify code
1. qsfp threshold: tx power
2. beautify code, removing some magic numbers
3. optimize get_present by only reading one byte.
2019-11-20 16:30:05 -08:00
Stephen Sun
249e858926 [Mellanox] Support SN3800 in platform api (#3593)
* [sonic_platform]Support 3800
1. add port position tuple for 3800

* [sonic_platform/chassis] address comments
2019-10-16 18:31:27 +03:00
Stephen Sun
2e3fb905a3 [Mellanox]Correct reboot cause when reboot via power cycle (#3597)
* [sonic_platform]remove the handling of reset_sw_reset which indicates rebooted by software.

* [sonic_platform]Check "reset_sw_reset"
Also check reboot cause file "reset_sw_reset" which indicates the system was rebooted due to software requesting.
2019-10-15 11:46:09 -07:00
Stephen Sun
aea09ba1da [sonic_platform] Correct the wrong log identifiers (#3596) 2019-10-15 11:29:45 -07:00
Stephen Sun
576f0982d2 [Mellanox]Resolve chassis broken due to inconsistent with latest sonic_platform_common (#3569)
*Currently get_firmware_version implementated by using chassis.get_firmware_version and chassis._component_name_list which are not supported in the latest sonic_platform_common, causing chassis broken. Update this part so that it aligns to the latest sonic_platform_common
*Support component API
2019-10-09 11:07:30 -07:00
Stephen Sun
350d2c5d2b [chassis.py] Fix issue in get_change_event: the returned dictionary doesn't contain 'sfp' key. (#3568) 2019-10-08 09:26:35 -07:00
Stephen Sun
362a6855ec [Mellanox] enhance the initialization flow of sfp part of new platform api (#3319)
* [sonic_platform.sfp_event]enhance the initialization flow of sfp_event

* [sonic_platform.sfp_event] replace "retry = retry + 1" with "retry += 1"

* [sonic_platform] fix typo in sfp_event

* [sfp_event] remove unused variables

* [sonic_platform/sfp_event.py]remove unnecessary statements
2019-09-25 11:41:07 -07:00
Stephen Sun
5c2d71138b [Mellanox] optimize new platform api (#3289)
optimize SFP module operations and fix issues.

- split initialization of variant categories of devices and initialize each category of devices only when needed, so that unnecessary dependencies can be avoided.
- update watchdog logic, only initializing watchdog when referenced.
- support platform.py and enable to initialize variant devices on a host/docker basis
- update init so that sonic_platform can be imported as a whole.
2019-08-28 11:59:37 -07:00
Stephen Sun
97b43f96bb [mlnx_platform_api.thermal]align thermal sensor names with hw-management v2.0.0191 (#3371)
temp_xxxx_module{} => module{}_temp_xxxx
2019-08-23 11:58:03 -07:00
Stephen Sun
a5de31bf43 [Mellanox]new platform api -- support get_change_event (#3142)
* [Mellanox]refractor the sfp event change notification logic for new platform api
remove the standalong daemon which is in charge of polling sfp change event through sdk interface
and move the polling stuff to the event in the chassis daemon.

* rephase some comment

* fix typo in sfp_event.sfp_event.initialize
2019-07-28 15:18:39 +03:00
Stephen Sun
1d15022df7 [Mellanox] support new platform api, thermal and psu part (#3175)
* support new platform api, thermal and psu part
for psu, all APIs are supported.
for thermal, we support
  get_temperature,
  get_high_threshold
for the thermal sensors of cpu core, cpu pack, psu and sfp module
and get_temperature for the ambient thermal sensors around the asic, port, fan, comex and board.

* 1. address review comments
2. improve the handling of PSU inserting/removal
3. tolerance diverse psu thermal sensor file name conventions

* 1. adjust thermal code according to the latest version of hw-management
2. check power_good_status rather than whether file existing ahead of reading voltage, current and power of PSU
2019-07-22 07:59:48 -07:00
Stephen Sun
20e4547dbc [Mellanox] Fix typo "xSFP_VLOT_OFFSET" (#3118)
Variables SFP_VLOT_OFFSET and QSFP_VLOT_OFFSET containing the typo are originally defined in repo sonic-platform-common. The typo has been fixed in PR #33. However, some Mellanox-specific code hasn't updated correspondingly, which results in xcvrd fail to start.
This PR updates the variable name in Mellanox-specific code correspondingly to fix that.
2019-07-05 14:06:18 -07:00
Stephen Sun
82fb3a099d [Mellanox]New platform api -- chassis part (#3082)
* new platform api, chassis part

* Inject mlnx mlx libs to platform monitor

* address the review comments

* remove some confusing naming.

* Adjust the minor cause to a more human-readable way when rebooted by firmware

* address review comments

* expose host dir /host/reboot-cause to pmon docker so that the reboot causing by user command can be identified

* 1. Revert "expose host dir /host/reboot-cause to pmon docker so that the reboot causing by user command can be identified"
Since the only hardware-causing reboot should be handled by get_reboot_cause and the logic of handling reboot cause is about to move to the host side, no need to mount this dir to pmon docker.
This reverts commit 3feb96869d.
2. adjust log output by using sonic_daemon_base.daemon_base.Logger.
3. remove the logic of verifying /host/reboot-cause/ files.
4. fix typo.

* implement get_firmware_version and adjust the interfaces regarding components' version retrieving according to the Azure/sonic-platform-common#34
2019-07-04 14:29:58 +03:00
Stephen Sun
86495a15a2 [Mellanox] Support new platform api sfp part (#3101)
Implement new platform api sfp part, including the following APIs;
- get_reset_status
- get_tx_disable_channel
- get_lpmode
- get_power_override
- reset
- set_lpmode
- tx_disable
- tx_disable_channel
- set_power_override
2019-07-02 14:50:20 -07:00
txj36
22c0f4d877 [devices]: fix SFP initialization in the Chassis for mlnx-platform-api (#3012) 2019-07-02 11:39:24 -07:00
Kebo Liu
89ee636b99 [Mellanox] SFP new platform API implementation (#2944)
* add sfp new api

* fix get presence
2019-05-29 09:46:20 +03:00
Kebo Liu
818ba436a9 [Mellanox] Implement new fan platform API (#2747) 2019-04-21 14:34:28 -07:00
Stepan Blyshchak
f06c67b456 [mellanox] Implement Watchdog API based on the new platform API (#2607)
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2019-02-28 15:57:38 -08:00