In order to support SONiC physical entity mib extension, a few new platform API are added to sonic-platform-common, this PR is to provide an mellanox platform implementation for those new APIs.
New driver support fetching additional pages from the cable EEPROM.
There are additional information to parse now: RX/TX power, TX bias, TX fault and RX LOS.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
When detecting a new SFP insertion, read its SFP type and DOM capability from EEPROM again.
SFP object will be initialized to a certain type even if no SFP present. A case could be:
1. A SFP object is initialized to QSFP type by default when there is no SFP present
2. User insert a SFP with an adapter to this QSFP port
3. The SFP object fail to read EEPROM because it still treats itself as QSFP.
This PR fixes this issue.
Now we are reading base mac, product name from eeprom data, and the data read from eeprom contains multiple "\0" characters at the end, need trim them to make the string clean and display correct.
The `get_serial_number()` method in the ChassisBase and ModuleBase classes was redundant, as the `get_serial()` method is inherited from the DeviceBase class. This method was removed from the base classes in sonic-platform-common and the submodule was updated in https://github.com/Azure/sonic-buildimage/pull/5625.
This PR aligns the existing vendor platform API implementations to remove the `get_serial_number()` methods and ensure the `get_serial()` methods are implemented, if they weren't previously.
Note that this PR does not modify the Dell platform API implementations, as this will be handled as part of https://github.com/Azure/sonic-buildimage/pull/5609
**- Why I did it**
- Platform API implementation using sonic-cfggen to get platform name and SKU name, which will fail when the database is not available.
- Chassis name is not correctly assigned, it shall be assigned with EEPROM TLV "Product Name", instead of SKU name
- Chassis model is not implemented, it shall be assigned with EEPROM TLV "Part Number"
**- How I did it**
1. Chassis
> - Get platform name from /host/machine.conf
> - Remove get SKU name with sonic-cfggen
> - Get Chassis name and model from EEPROM TLV "Product Name" and "Part Number"
> - Add function to return model
2. EEPROM
> - Add function to return product name and part number
3. Platform
> - Init EEPROM on the host side, so also can get the Chassis name model from EEPROM on the host side.
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.
Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module
This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97
- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
System health feature needs to set/get system led status
- Add a led object in chassis class and initialize it when the API is called on host side
- Read/write system led system fs to get/set the status
**- Why I did it**
System health feature requires to read ASIC temperature and threshold from platform API
**- How I did it**
Implement Chassis.get_asic_temperature and Chassis.get_asic_temperature_threshold by getting value from system fs.
* Change port index in port_config.ini to 1-based
* Add default port index to port_config.ini, change platform plugins to accept 1-based port index
* fix port index in sfp_event.py
* Trigger thermal action log only if thermal condition changes
* test file existence before read file content
* fix error for set psu fan speed
* Remove logs because it print too frequently
* New SKU support for MSN3420
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
Conflicts:
device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
* Add CPLD's
* Symlink fixes and semantics
* Adding new platform at end of lines
* [thermal control] Fix pmon docker stop issue on 3800
* [thermal fix] Fix QA test issue
* [thermal fix] change psu._get_power_available_status to psu.get_power_available_status
* [thermal fix] adjust log for PSU absence and power absence
* [thermal fix] add unit test for loading thermal policy file with duplicate conditions in different policies
* [thermal] fix fan.get_presence for non-removable SKU
* [thermal fix] fix issue: fan direction is based on drawer
* Fix issue: when fan is not present, should not read fan direction from sysfs but directly return N/A
* [thermal fix] add unit test for get_direction for absent FAN
* Unplugable PSU has no FAN, no need add a FAN object for this PSU
* Update submodules
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
* add MSN4700 device files
* update ACS-MSN4700 sai profile
* update buffer pool size, headroom, sensor conf, port config and reboot scripts
* fix ident
* update sensor conf and buffer pool
* [sn4700] add sku 4700 to chassis.py
* [Mellanox-4700] Add 4700 info to psu and thermal platform API
* update buffer config file template to the latest.
update SAI profile to use 100G X 4lanes for now
update port_config.ini according to the SAI profile
* [Mellanox]Update the buffer configurations for 4700
* fix alignment in pg_profile_lookup.ini
* add platform components file for new sku
* Update device/mellanox/x86_64-mlnx_msn4700-r0/ACS-MSN4700/pg_profile_lookup.ini
Co-Authored-By: Nazarii Hnydyn <nazariig@mellanox.com>
* remove redundant line
* [Mellanox]Correct type, buffer size
Co-authored-by: Nazarii Hnydyn <nazariig@mellanox.com>
Co-authored-by: junchao <junchao@mellanox.com>
Co-authored-by: Stephen Sun <stephens@mellanox.com>
Fix the issue when an SFP module is plugged into a QSFP port via an adapter.
- How I did it
Originally the type of an SFP module is determined according to the SKU dictionary. However, it's possible that as SFP module is plugged into a QSFP port via an adapter. In this case, the EEPROM content will be parsed in the wrong format.
To address that we fetch the identifier value of an xSFP module and then get the type by parsing it.
* [sonic_platform.sfp] support get_transceiver_dom_threshold_info_dict
* [platform/sfp]qsfp threshold and beautify code
1. qsfp threshold: tx power
2. beautify code, removing some magic numbers
3. optimize get_present by only reading one byte.
* [sonic_platform]remove the handling of reset_sw_reset which indicates rebooted by software.
* [sonic_platform]Check "reset_sw_reset"
Also check reboot cause file "reset_sw_reset" which indicates the system was rebooted due to software requesting.
*Currently get_firmware_version implementated by using chassis.get_firmware_version and chassis._component_name_list which are not supported in the latest sonic_platform_common, causing chassis broken. Update this part so that it aligns to the latest sonic_platform_common
*Support component API
optimize SFP module operations and fix issues.
- split initialization of variant categories of devices and initialize each category of devices only when needed, so that unnecessary dependencies can be avoided.
- update watchdog logic, only initializing watchdog when referenced.
- support platform.py and enable to initialize variant devices on a host/docker basis
- update init so that sonic_platform can be imported as a whole.
* [Mellanox]refractor the sfp event change notification logic for new platform api
remove the standalong daemon which is in charge of polling sfp change event through sdk interface
and move the polling stuff to the event in the chassis daemon.
* rephase some comment
* fix typo in sfp_event.sfp_event.initialize
* support new platform api, thermal and psu part
for psu, all APIs are supported.
for thermal, we support
get_temperature,
get_high_threshold
for the thermal sensors of cpu core, cpu pack, psu and sfp module
and get_temperature for the ambient thermal sensors around the asic, port, fan, comex and board.
* 1. address review comments
2. improve the handling of PSU inserting/removal
3. tolerance diverse psu thermal sensor file name conventions
* 1. adjust thermal code according to the latest version of hw-management
2. check power_good_status rather than whether file existing ahead of reading voltage, current and power of PSU
Variables SFP_VLOT_OFFSET and QSFP_VLOT_OFFSET containing the typo are originally defined in repo sonic-platform-common. The typo has been fixed in PR #33. However, some Mellanox-specific code hasn't updated correspondingly, which results in xcvrd fail to start.
This PR updates the variable name in Mellanox-specific code correspondingly to fix that.
* new platform api, chassis part
* Inject mlnx mlx libs to platform monitor
* address the review comments
* remove some confusing naming.
* Adjust the minor cause to a more human-readable way when rebooted by firmware
* address review comments
* expose host dir /host/reboot-cause to pmon docker so that the reboot causing by user command can be identified
* 1. Revert "expose host dir /host/reboot-cause to pmon docker so that the reboot causing by user command can be identified"
Since the only hardware-causing reboot should be handled by get_reboot_cause and the logic of handling reboot cause is about to move to the host side, no need to mount this dir to pmon docker.
This reverts commit 3feb96869d.
2. adjust log output by using sonic_daemon_base.daemon_base.Logger.
3. remove the logic of verifying /host/reboot-cause/ files.
4. fix typo.
* implement get_firmware_version and adjust the interfaces regarding components' version retrieving according to the Azure/sonic-platform-common#34