sonic-buildimage

Author	SHA1	Message	Date
Vivek	787dd7221d	[Mellanox] Upgrade HW-MGMT to 7.0030.2008 and update platform-api (#17134 ) Why I did it Add platform support for Debian 12 (Bookworm) on Mellanox Platform How I did it Update hw-management to v7.0030.2008 Deprecate the sfp_count == module_count approach in favour of asic init completion Ref: Mellanox/hw-mgmt@bf4f593 Add xxd package to base image which is required by hw-management scripts Add the non-upstream flag into linux kernel cache options Update the thermalctl logic based on new sysfs attributes Fix the integrate-mlnx-hw-mgmt script to not populate the arm64 Kconfig How to verify it Build kernel and run platform tests Signed-off-by: Vivek Reddy <vkarri@nvidia.com> Co-authored-by: Junchao-Mellanox <junchao@nvidia.com> Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>	2023-11-21 18:53:15 -08:00
Stephen Sun	b93852d53d	[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584 ) - Why I did it Support running hw-management service on MSN4700 emulation platform. - How I did it Use physical EEPROM instead of the fake one Do not skip PSUd, PCId, thermal control daemon Adjust PCIe and thermal configuration files Adjust platform.json for different chassis names and thermals Remove a patch to hw-management in order to enable it - How to verify it Run Nvidia simulation on SN4700 (ASIC and Platform) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-11-19 11:03:46 +02:00
Junchao-Mellanox	c2edc6f9d5	Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820 )" (#16956 ) This reverts commit `0846322e9a`.	2023-10-23 11:55:27 +03:00
Junchao-Mellanox	0846322e9a	[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820 ) - Why I did it hw-management renamed PSU temperature related sysfs: psu1_temp -> psu1_temp1 psu2_temp -> psu2_temp1 psu1_temp_max -> psu1_temp1_max psu2_temp_max -> psu2_temp1_max This PR is to align the change in SONiC. - How I did it Use new sysfs node for PSU temperature and PSU temperature threshold - How to verify it Manual test sonic-mgmt Regression test	2023-10-10 19:21:27 +03:00
Junchao-Mellanox	aedffd333b	[Mellanox] wait reset cause ready (#16722 ) Why I did it SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready. How I did it /run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds. How to verify it Manual test on master/202211/202205	2023-10-03 18:58:31 -07:00
Vivek	456a90e1ab	[Nvidia] Remove the dependency on python_sdk_api for sfp api (#16545 ) Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle. Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2023-09-23 00:19:27 -07:00
Junchao-Mellanox	5138afe4e7	[Mellanox] add new platform 2700 a1 (#16515 ) - new pcie.yaml - new sensors.conf - new thermal support - new platform.json file - adjust test code	2023-09-23 00:15:17 -07:00
Kebo Liu	e286869b24	[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239 ) - Why I did it 1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011 2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms 3. Support Spectrum-4 systems - How I did it 1. Update the HW-MGMT package version number and submodule pointer 2. Remove the thermal control algorithm implementation from Mellanox platform API 3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX 4. Update the downstream kernel patch list Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-09-06 11:32:08 +03:00
Vadym Hlushko	9e3fdded69	[Mellanox][SFP] Remove unused function parameter (#16318 ) Why I did it To avoid errors when the sfputil show error-status -hw is called from the host OS (not from the pmon docker). How I did it Remove the self.sdk_handle parameter from the _get_module_info() function. How to verify it Execute the sfputil show error-status -hw Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>	2023-09-01 23:06:04 -07:00
Junchao-Mellanox	95f317a5e2	[Mellanox] Fix issue: watchdogutil command does not work (#16091 ) - Why I did it watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands: admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True Watchdog armed for 100 seconds admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False Status: Unarmed admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False Failed to disarm Watchdog - How I did it Use sysfs to query watchdog status - How to verify it Manual test Unit test	2023-08-23 09:30:58 +03:00
Vadym Hlushko	b214a8a8b6	[Mellanox] Change SDK API sx_mgmt_phy_module_info_get() to sysfs (#15963 ) - Why I did it Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields. - How I did it Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs - How to verify it Simulate the unplug cable event Check the CLI output sfputil show presence sfputil show error-status -hw Simulate the plug cable event Repeat 2 step Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>	2023-08-21 20:54:13 +03:00
Junchao-Mellanox	91f3da018e	[Mellanox] Add more unit test coverage for platform API (#15842 ) - Why I did it Increase UT coverage for Nvidia platform API code Work item tracking Microsoft ADO (number only): - How I did it Focus on low coverage file: 1. component.py 2. watchdog.py 3. pcie.py - How to verify it Run the unit test, the coverage has been changed from 70% to 90%	2023-08-03 13:54:31 +03:00
Junchao-Mellanox	ed21266ff4	[Mellanox] Remove reset_from_comex from reboot cause mapping (#15793 ) - Why I did it The reset cause "reset_from_comex" has been removed by hw-management, hence removing it from platform API code - How I did it Remove reset_from_comex from reboot cause mapping - How to verify it Manual test	2023-07-15 01:02:46 +03:00
DavidZagury	b06a856fba	[Mellanox] Add support for BIOS update on Spectrum-4 (#15795 ) - Why I did it BIOS on new generation switch can come with a file type of cap or cab. Needs to add support to these file type. Also ONIE version on new devices can have a suffix of 'dev'. - How I did it Added cap & cab as possible component extensions for ComponentBIOS. Update the ONIE version regex to include dev signed versions. - How to verify it Update BIOS.	2023-07-15 00:59:55 +03:00
Stephen Sun	238e6ffcc1	[Mellanox] Adjust warning threshold implementation according to the latest algorithm update (#15092 ) - Why I did it Adjust the warning threshold implementation according to the latest algorithm update - How I did it Modify power warning and critical thresholds methods - How to verify it Unit test updated to cover the change Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-06-13 15:14:10 +03:00
Junchao-Mellanox	18cf719d6a	[Mellanox] Use sysfs for sfp reset/LPM/presence (#14130 ) - Why I did it The current implementation of SFP reset, LPM, present relies on SDK API. This PR moves the implementation to SDK sysfs. By this PR, it gains following benefit: 1. SDK sysfs provides better performance. 2. Host side and container side share the same code. 3. Code is much cleaner. - How I did it Use SDK sysfs to implement SFP reset, LPM, present. - How to verify it 1. Manual test. 2. Unit test.	2023-05-24 17:24:34 +03:00
daxia16	1175143af1	[Mellanox] Support UID LED in platform API (#11592 ) - Why I did it As a LED indicator to help user to find switch location in the lab, UID LED is a useful LED in Mellanox switch. - How I did it I add a new member _led_uid in Mellanox/Chassis.py, and extend Mellanox/led.py to support blue color. Relevant platform-common PR sonic-net/sonic-platform-common#369 - How to verify it Add unit test cases in test.py, and do manual test including turn-on/off/show uid led. Signed-off-by: David Xia <daxia@nvidia.com>	2023-05-16 08:24:39 +03:00
Junchao-Mellanox	7962a5c0fa	[Mellanox] add PSU fan direction support (#14508 ) - Why I did it Add PSU fan direction support - How I did it Implement fan.get_direction for PSU fan - How to verify it Manual test Unit test	2023-05-15 21:34:54 +03:00
Junchao-Mellanox	9deca05f9d	[Mellanox] get LED capability from capability file (#14584 ) - Why I did it Currently, LED sysfs path is hardcoded. We will need change LED code if new LED color is supported for new platforms. This PR is aimed to improve this. By this PR, LED sysfs path is deduced from LED capability file. - How I did it Improve LED management on Nvidia platform: get LED capability from capability file and deduce sysfs name according to the capability - How to verify it Unit test Manual test	2023-05-10 20:53:50 +03:00
Sudharsan Dhamal Gopalarathnam	8d82a86134	[Mellanox]Fix lpmode set when logical port is larger than 64 (#14138 ) - Why I did it In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1 - How I did it Remove the hardcoded value of 64. Obtained the number of logical ports from SDK - How to verify it Manual testing	2023-03-09 00:02:55 +02:00
Junchao-Mellanox	f6d3615bb9	[Mellanox] Check system eeprom existence in a retry manner (#13884 ) - Why I did it On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it. - How I did it Waiting EEPROM creation in platform API up to 10 seconds. - How to verify it Manual test	2023-02-21 19:40:16 +02:00
Junchao-Mellanox	331b97e2aa	[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710 ) - Why I did it sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message. - How I did it Don't use hardcoded 64, get logical port number instead. - How to verify it Manual test	2023-02-21 08:14:29 +02:00
Stephen Sun	71b5bb6f37	[Mellanox] Support per PSU slope value for PSU power threshold (#13757 ) - Why I did it Support per PSU slope value for PSU power threshold according to hardware team requirement - How I did it Pass the PSU number as a parameter when fetching the slope value of PSU. - How to verify it Running regression and manual test Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-02-14 08:55:28 +02:00
Junchao-Mellanox	389b279ba9	[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (#13611 ) - Why I did it Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation. To fix issue #13591. - How I did it The select call in get_change_event should use no more than 1 second as timeout parameter. Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected - How to verify it Manual test	2023-02-14 08:26:25 +02:00
Kebo Liu	7873a9131d	[Mellanox] Skip the leftover hardware reboot cause in case of last boot is warm/fast reboot (#13246 ) - Why I did it In case of warm/fast reboot, the hardware reboot cause will NOT be cleared because CPLD will not be touched in this flow. To not confuse the reboot cause determine logic, the leftover hardware reboot cause shall be skipped by the platform API, platform API will return the 'REBOOT_CAUSE_NON_HARDWARE' instead of the "hardware" reboot cause. - How I did it Check the proc cmdline to see whether the last reboot is a warm or fast reboot, if yes skip checking the leftover hardware reboot cause. - How to verify it a. Manual test: - Perform a power loss - Perform a warm/fast reboot - Check the reboot cause should be "warm-reboot" or "fast-reboot" instead of "power loss" b. Run reboot cause related regression test. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-01-11 16:50:46 +02:00
Vadym Hlushko	1a5889ade7	[SFP] Change logging severity when failed to read EEPROM (#13011 ) - Why I did it In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step. The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED). The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable). To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog: ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register - How I did it Change logging severity from ERR to WARNING - How to verify it Run the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py OR much faster way to run the next script on the switch: #!/bin/bash START=0 END=248 for (( intf=$START; intf<=$END; intf+=8)) do sfputil reset Ethernet"${intf}" done sfputil show presence	2022-12-20 10:05:45 +02:00
Kebo Liu	d6ee7f08c2	[Mellanox] change the implementation of is_host() to fix a stuck issue on simx platform (#13100 ) - Why I did it Following code to judge whether a process is running inside a docker could get stuck on the simx platform subprocess.Popen(["docker", "--version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True) When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up. root@sonic:/# service config-chassisdb status config-chassisdb.service - Config chassis_db Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled) Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago Main PID: 571 (config-chassisd) Tasks: 14 (limit: 9501) Memory: 132.4M CGroup: /system.slice/config-chassisdb.service ├─571 /bin/bash /usr/bin/config-chassisdb ├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform ├─602 /bin/sh -c sudo decode-syseeprom -m ├─603 sudo decode-syseeprom -m ├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m ├─616 /bin/sh -c docker --version 2>/dev/null └─617 docker --version - How I did it Use an alternative way to implement this function and issue can be avoided: docker_env_file = '/.dockerenv' return os.path.exists(docker_env_file) is False - How to verify it run regression on real hardware and simx platform.	2022-12-20 10:00:11 +02:00
Junchao-Mellanox	9590339d69	[Mellanox] Remove TODO comments which are no longer needed (#13023 ) - Why I did it Remove TODO comments which are no longer needed - How I did it Remove TODO comments which are no longer needed - How to verify it Only comment change	2022-12-14 09:57:48 +02:00
Stephen Sun	5d457596ba	[Mellanox] Support PSU power threshold checking (#11863 ) * Support power threshold Signed-off-by: Stephen Sun <stephens@nvidia.com> * get_psu_power_warning_threshold => get_psu_power_warning_suppress_threshold Signed-off-by: Stephen Sun <stephens@nvidia.com> * Fix comments Signed-off-by: Stephen Sun <stephens@nvidia.com> Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-11-21 14:47:43 -08:00
Junchao-Mellanox	20d885dbc2	[Mellanox] Add new thermal sensors for SN5600 (#12671 ) - Why I did it Add new thermal sensors for SN5600 - How I did it Add new thermal sensors for SN5600: PCH and SODIMM - How to verify it Manual test	2022-11-14 11:10:33 -08:00
Kebo Liu	c8c2b7fc45	[Mellanox] [Platform API] Update SN2201 dynamic minimum fan speed table (#12602 ) - Why I did it Update SN2201 dynamic minimum fan speed table according to data provided by the thermal team. - How I did it Update the thermal table in device_data.py - How to verify it Run platform related regression Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-11-08 13:37:10 +02:00
Junchao-Mellanox	830b7d8cb4	[Mellanox] Use sdk sysfs instead of ethtool (#12480 )	2022-11-03 11:17:44 -07:00
Stephen Sun	8c73e68468	Remove \n from the end of fs_path in ONIEUpdater (#12465 ) This fixes the following error ``` admin@sonic:~$ sudo fwutil show status mount: /mnt/onie-fs: special device /dev/sda2 does not exist. Error: Command '['mount', '-n', '-r', '-t', 'ext4', '/dev/sda2\n', '/mnt/onie-fs']' returned non-zero exit status 32.. Aborting... Aborted! admin@sonic:~$ sudo vi /usr/local/lib/python3.9/dist-packages/sonic_platform/ ``` Seems like #11877 the rstrip('\n') was removed. Probably by mistake. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-10-23 09:59:20 +03:00
Mai Bui	648ca075c7	[device/mellanox] Mitigation for security vulnerability (#11877 ) Signed-off-by: maipbui <maibui@microsoft.com> Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first. #### Why I did it `subprocess.Popen()` and `subprocess.check_output()` is used with `shell=True`, which is very dangerous for shell injection. #### How I did it Disable `shell=True`, enable `shell=False` #### How to verify it Tested on DUT, compare and verify the output between the original behavior and the new changes' behavior. [testresults.zip](https://github.com/sonic-net/sonic-buildimage/files/9550867/testresults.zip)	2022-10-06 17:51:31 -04:00
Dror Prital	44356fa8d7	[Mellanox] Add NVIDIA copyright header for NVIDIA added files (#12130 ) - Why I did it Add NVIDIA Copyright header for new "NVIDIA" files - How I did it Add the copyright header as remark at the head of the file	2022-10-02 11:34:24 +03:00
Junchao-Mellanox	1d69f0916e	[Mellanox] Provide dummy implementation for get_rx_los and get_tx_fault (#12231 ) - Why I did it get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them. NOTE: in later releases we will get them back via different interface. - How I did it Return False * lane_num for get_rx_los and get_tx_fault - How to verify it Added unit test	2022-09-30 09:38:05 +03:00
Stephen Sun	4d317aff94	[Mellanox] Fix typo in platform API (#12136 ) - Why I did it Fix a typo in chassis platform API which causes the following error >>> import sonic_platform as P >>> c = P.platform.Platform().get_chassis() >>> sl = c.get_all_sfps() >>> sl[0].get_lpmode() Sep 28 07:48:33 INFO LOG: Initializing SX log with STDOUT as output file. False >>> del c Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0> Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__ self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle) NameError: name 'sfp_module' is not defined - How I did it Use self while using the SDK handle - How to verify it Manual test Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-09-28 11:09:18 +03:00
Junchao-Mellanox	f890606d82	Revert "[Mellanox] Redirect ethtool stderr to subprocess for better error log (#12038 )" (#12183 ) This reverts commit `9750cb4`. There is a PR to handle 202205 branch revert: #12184 - Why I did it The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged: Cannot get module EEPROM information: Input/output error Before the "bad" PR, the message format is like this: INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message: NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error Rsyslog no longer truncate such log and many such messages are flooded to syslog. - How I did it Revert the PR - How to verify it Manual test	2022-09-28 10:15:26 +03:00
Junchao-Mellanox	9750cb48c6	[Mellanox] Redirect ethtool stderr to subprocess for better error log (#12038 ) - Why I did it ethtool print error logs when EEPROM of a SFP is not available. It prints error like this: INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP. - How I did it Redirect ethtool stderr to subprocess and log it better - How to verify it Manual test	2022-09-14 20:41:43 +03:00
Junchao-Mellanox	46ebd06403	[Mellanox] Fix issue: set lpmode by platform API does not work (#11732 ) - Why I did it Fix issue: set lpmode by platform API does not work - How I did it Fix miss return value in code - How to verify it Manual test	2022-08-18 13:07:38 +03:00
orfarfara	aec1248258	[Mellanox] add PSU input voltage and current (#11510 ) - Why I did it Add PSU input voltage and input current to mlnx platform api. - How I did it Implement 2 function of getting the psu voltage and psu current input: Get the values from "power/psu{}_curr_in" , "power/psu{}_volt_in" - How to verify it Manual test. Run sonic-mgmt regression Signed-off-by: orfar1994 <orfar1994@gmail.com>	2022-08-10 18:10:55 +03:00
Junchao-Mellanox	4c1c0c1852	[Mellanox] add more log while doing sysfs reading (#11556 ) - Why I did it Add more log while doing sysfs reading to increase the debug capability - How I did it Log the relevant file path and error number while sysfs reading return None - How to verify it Manual test	2022-08-08 15:06:52 +03:00
Stephen Sun	8282d427e4	Fix chassis test issue (#11460 ) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-07-16 19:34:45 -07:00
Stephen Sun	81600fafe9	[Mellanox] Support new platform API get_port_or_cage_type for RJ45 ports (#11336 ) - Why I did it Support get_port_or_cage_type for RJ45 ports - How I did it Implement the new platform API get_port_or_cage_type Fix the issue: unable to import SFP when chassis object is destructed - How to verify it Manually test and regression test Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-07-14 12:20:16 +03:00
Junchao-Mellanox	2863945f7c	[Mellanox] Fix issue: failed to decode Json while there is no hwsku.json (#11436 ) - Why I did it Fix bug: pmon report error on start up because some SKUs do not have hwsku.json - How I did it If hwsku.json, do not extract RJ45 port information - How to verify it Manual test. Unit test.	2022-07-14 09:24:39 +03:00
Sudharsan Dhamal Gopalarathnam	23d68883f5	[Mellanox]Check dmi file permission before access (#11309 ) Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com Why I did it During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied') How I did it Check the file permission before accessing it. How to verify it Added UT to verify. Manually verified if the error log is not thrown.	2022-07-01 17:29:07 -07:00
Kebo Liu	7ac590b5c5	[Mellanox] Enhance Platform API to support SN2201 - RJ45 ports and new components mgmt. (#10377 ) * Support new platform SN2201 and RJ45 port Signed-off-by: Kebo Liu <kebol@nvidia.com> * remove unused import and redundant function Signed-off-by: Kebo Liu <kebol@nvidia.com> * fix error introduced by rebase Signed-off-by: Kebo Liu <kebol@nvidia.com> * Revert the special handling of RJ45 ports (#56) * Revert the special handling of RJ45 ports sfp.py sfp_event.py chassis.py Signed-off-by: Stephen Sun <stephens@nvidia.com> * Remove deadcode Signed-off-by: Stephen Sun <stephens@nvidia.com> * Support CPLD update for SN2201 A new class is introduced, deriving from ComponentCPLD and overloading _install_firmware Change _install_firmware from private (starting with __) to protected, making it overloadable Signed-off-by: Stephen Sun <stephens@nvidia.com> * Initialize component BIOS/CPLD Signed-off-by: Stephen Sun <stephens@nvidia.com> * Remove swb_amb which doesn't on DVT board any more Signed-off-by: Stephen Sun <stephens@nvidia.com> * Remove the unexisted sensor - switch board ambient - from platform.json Signed-off-by: Stephen Sun <stephens@nvidia.com> * Do not report error on receiving unknown status on RJ45 ports Translate it to disconnect for RJ45 ports Report error for xSFP ports Signed-off-by: Stephen Sun <stephens@nvidia.com> * Add reinit for RJ45 to avoid exception Signed-off-by: Stephen Sun <stephens@nvidia.com> Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Co-authored-by: Stephen Sun <stephens@nvidia.com>	2022-06-20 19:12:20 -07:00
Junchao-Mellanox	f135f37a50	[Mellanox] optimize platform API import time (#10815 ) - Why I did it "import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before. - How I did it Find slow import and call them when need. - How to verify it Measure the import time.	2022-06-07 15:13:16 +03:00
Andriy Yurkiv	70d71f99f5	[Mellanox] Credo Y-cable \| add more log info, checks, fix exception message (#10779 ) - Why I did it Script fails when there is an exception while reading - How I did it Add more logs and checks. Fix wrong variable naming and messages. - How to verify it Provoke exception while read_eeprom() and check that it is handled properly	2022-05-19 17:36:02 +03:00
Junchao-Mellanox	af5e5c4c94	[Mellanox] Adjust PSU voltage WA (#10619 ) - Why I did it InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB. - How I did it 1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better 2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test 3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud - How to verify it Manual test. Run sonic-mgmt regression	2022-04-22 11:02:30 +03:00

1 2 3 4

179 Commits