Commit Graph

614 Commits

Author SHA1 Message Date
Nazarii Hnydyn
085765a45d
[202311][Mellanox][MFT]: Update MFT to 4.27.0-83: bash login fix (#18462)
* Revert "[mellanox]: Disable MFT bash autocompletion. (#17543)"

This reverts commit 49e96c3daa.

* [mellanox][mft]: Update MFT to 4.27.0-83: bash login fix.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>

---------

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2024-03-25 09:29:33 -07:00
Sudharsan Dhamal Gopalarathnam
8deb493673 [Mellanox]Adding dependency of libnl-route-3 for Mellanox SAI library (#18197)
- Why I did it
Adding explicit dependency of libnl-route-3 for Mellanox SAI library. This is required for the latest SAI library.

- How I did it
Modifying Make files

- How to verify it
Building with the changes.
2024-03-22 07:01:10 +08:00
dbarashinvd
ac35a0fafb add unit tests for CMIS host management feature (#18211)
* add unit tests for CMIS host management feature
2024-03-21 07:00:43 +08:00
noaOrMlnx
86613b8dcc [Mellanox] Fix timing issue in lpmode change (#18223)
- Why I did it
Changing LPMODE timing is different between cables.
We want to add functionality to make sure LPMODE has changed.
For that, the wait_until utility is used and every 1 second (until timeout), it will check with lower-layers what is the current Lpmode.
Once it is the expected mode, set_lpmode() functino will return True.
If after seconds, Lpmode is still not in the expected mode, set_lpmode() function will return False.

- How I did it
Add use of wait_until function to make sure lpmode was changed.

- How to verify it
sfputil lpmode on
sfputil lpmode off
2024-03-06 07:00:47 +08:00
Kebo Liu
865d2ba44b [Mellanox] Extend the time to wait for EEPROM VPD file creation (#18146)
- Why I did it
The creation of system EEPROM VPD file "/var/run/hw-management/eeprom/vpd_info" is triggered by the udev event during the system boot up, in case the CPU is busy during the bootup, the udev event handling can be delayed, and need to wait for some more time for the file creation.

- How I did it
Extend the waiting time from 10s to 20s to overcome some extreme case.

- How to verify it
continuously run reboot case and verify whether still can see error msg "ERR decode-syseeprom: Nowhere to read syseeprom from! No symlink found"

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2024-03-05 07:00:49 +08:00
noaOrMlnx
b10abaf435 [Mellanox] Update Nvidia sai.profile SKU files to have common file (#18074)
* Update Nvidia sai.profile SKU files to have common file

* Remove SAI_DUMP_MFT_CFG_PATH from sai-common.profile as it is not in use
2024-02-29 13:01:28 +08:00
Yevhen Fastiuk
491cf9a3f8 [Mellanox] Fix uninitialized variable on module plug event (#17011)
- Why I did it
To fix uninitialized variable

- How I did it
Add initial value

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2024-02-17 12:34:35 +08:00
dbarashinvd
b967cf0b99 [Mellanox] fix sysfs reading that gets garbage end of line using strip (#17830)
- Why I did it
when reading sysfs fd upon python poller events, there's end of line garbage like "# 012" (without space between the 2 parts) trailing the real value of 1 or 0

- How I did it
using python strip() to remove end of line

- How to verify it
run the CMIS host management feature on a switch
wait few minutes until switch completes boot up sequence including CMIS host manager
then disconnect or reconnect a port to create a poller event
2024-02-17 12:34:31 +08:00
dbarashinvd
dcc5a162ec [Mellanox] fix code for warm reboot to work with FW controlled ports (#18065)
- Why I did it
Fix the code to work also after warm reboot to work with FW controlled ports.
In warm reboot the control state sysfs of each port does not change unlike reboot or fast boot.

- How I did it
1. Check procfs cmdline if warm reboot done this is due to the fact pmon don't recognize warm reboot when it's taking place since pmon is loaded after warm reboot is finished.
2. If warm reboot done, check in static detection part for each port if it's FW controlled. If so, leave it this way and stop the state machine flow (set it to final state).

- How to verify it
1. Boot a switch with CMIS host management with at least one FW controlled port (non active cables or non cmis cables) then run warm reboot.
2. Verify no errors of sysfs reading appears for control sysfs
2024-02-16 09:29:06 +08:00
Sudharsan Dhamal Gopalarathnam
fcef7d1095
[202311][Mellanox]Update SAI to 2311.26.0.28, SDK/FW to 4.6.2202/2012.2202 (#17975) 2024-02-01 17:53:49 -08:00
Junchao-Mellanox
d53fba12cb Fix error log while creating PSU thermal object (#17789)
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:

Jan  8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist

Jan  8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist

- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
2024-02-01 09:34:05 +08:00
Lior Avramov
4599f7aeaf [Nvidia] Update syncd docker to use python version 3 (#17735)
* Remove python2 from compilation of python-sdk-api

* Upgrade Python version in syncd RPC docker image to Python3
2024-01-31 10:32:20 +08:00
dbarashinvd
d7a77601e4 fix low polarity wrong value for hw_reset deassert and seek(0) before reading sysfs upon poll event (#17627)
* fix hw_reset low polarity (reverse values)

* move seek to beginning of sysfs fd before reading to resolve power_good
sysfs returns empty upon plug out cable
2024-01-23 09:38:50 +08:00
Junchao-Mellanox
8d65e2c517 [Mellanox] Fix issues found for CMIS host management (#17637)
- Why I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How to verify it
Manual test
Unit test
2024-01-20 06:32:58 +08:00
Junchao-Mellanox
0fbdc2b8ed [Mellanox] wait until hw-management watchdog files ready (#17618)
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.

- How I did it
adds a wait to make sure watchdog has been fully initialized.

- How to verify it
Manual test
sonic regression
2024-01-19 04:32:53 +08:00
Junchao-Mellanox
56ba5b10b4 [Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2024-01-17 06:33:11 +08:00
Kebo Liu
cacf46ff86
[202311][Mellanox] Integrate HW-MGMT Version 7.0030.2008 (#17659)
* Intgerate HW-MGMT 7.0030.2008 Changes

 ## Patch List
* 0285-UBUNTU-SAUCE-mlxbf-gige-Fix-intermittent-no-ip-issue.patch :
* 0286-pinctrl-Introduce-struct-pinfunction-and-PINCTRL_PIN.patch :
* 0287-pinctrl-mlxbf3-Add-pinctrl-driver-support.patch :
* 0288-UBUNTU-SAUCE-gpio-mmio-handle-ngpios-properly-in-bgp.patch :
* 0289-UBUNTU-SAUCE-gpio-mlxbf3-Add-gpio-driver-support.patch :
* 0291-mlxsw-core_hwmon-Align-modules-label-name-assignment.patch :
* 0292-mlxsw-i2c-Limit-single-transaction-buffer-size.patch :
* 0293-mlxsw-reg-Limit-MTBR-register-records-buffer-by-one-.patch :
* 0296-UBUNTU-SAUCE-mmc-sdhci-of-dwcmshc-Add-runtime-PM-ope.patch :
* 0298-UBUNTU-SAUCE-mlxbf-ptm-use-0444-instead-of-S_IRUGO.patch :
* 0299-UBUNTU-SAUCE-mlxbf-ptm-add-atx-debugfs-nodes.patch :
* 0300-UBUNTU-SAUCE-mlxbf-ptm-update-module-version.patch :
* 0301-UBUNTU-SAUCE-mlxbf-gige-Fix-kernel-panic-at-shutdown.patch :
* 0302-UBUNTU-SAUCE-mlxbf-bootctl-support-SMC-call-for-sett.patch :
* 0303-UBUNTU-SAUCE-Add-BF3-related-ACPI-config-and-Ring-de.patch :
* 0306-dt-bindings-trivial-devices-Add-infineon-xdpe1a2g7.patch :
* 0307-leds-mlxreg-Add-support-for-new-flavour-of-capabilit.patch :
* 0308-leds-mlxreg-Remove-code-for-amber-LED-colour.patch :
* 0308-platform_data-mlxreg-Add-capability-bit-and-mask-fie.patch :
* 0309-hwmon-mlxreg-fan-Add-support-for-new-flavour-of-capa.patch :
* 0310-hwmon-mlxreg-fan-Extend-number-of-supporetd-fans.patch :
* 0317-platform-mellanox-Introduce-support-for-switches-equ.patch :
* 0318-mellanox-Relocate-mlx-platform-driver.patch :
* 0319-UBUNTU-SAUCE-mlxbf-tmfifo-fix-potential-race.patch :
* 0320-UBUNTU-SAUCE-mlxbf-tmfifo-Drop-the-Rx-packet-if-no-m.patch :
* 0321-UBUNTU-SAUCE-mlxbf-tmfifo-Drop-jumbo-frames.patch :
* 0322-UBUNTU-SAUCE-mlxbf-tmfifo.c-Amend-previous-tmfifo-pa.patch :
* 0323-mlxbf_gige-add-set_link_ksettings-ethtool-callback.patch :
* 0324-mlxbf_gige-fix-white-space-in-mlxbf_gige_eth_ioctl.patch :
* 0325-UBUNTU-SAUCE-mlxbf-bootctl-Fix-kernel-panic-due-to-b.patch :
* 0326-platform-mellanox-mlxreg-hotplug-Add-support-for-new.patch :
* 0327-platform-mellanox-mlx-platform-Change-register-name.patch :
* 0328-platform-mellanox-mlx-platform-Add-support-for-new-X.patch :

* [Mellanox] Don't populate arm64 Kconfig when integrating hw-mgmt

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>

* [Mellanox] Remove thermal zone related code and replace with new one

* Revert "Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)" (#16956)"

This reverts commit c2edc6f9d5.

* Update copyright header

Signed-off-by: Kebo Liu <kebol@nvidia.com>

---------

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Co-authored-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2024-01-16 08:33:50 -08:00
Junchao-Mellanox
0b511986ae
[202311][Mellanox] implement platform wait in python code (#17398) (#17719)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2024-01-16 08:31:33 -08:00
Junchao-Mellanox
767944d7da [Mellanox] Fix race condition while creating SFP (#17441)
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':

Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")

- How I did it
Add lock for creating SFP object

- How to verify it
Unit test
Manual Test
2024-01-09 14:34:47 +08:00
Junchao-Mellanox
8de7cb5988
[202311] [Mellanox] update asic and module temperature in a thread for CMIS management (#16955) (#17699)
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
2024-01-08 10:50:59 -08:00
mssonicbld
4060f5ce5b
[Mellanox] Remove EEPROM write limitation if it is software control (#17030) (#17694) 2024-01-07 13:16:25 +08:00
mssonicbld
fb7bad2d11
[Mellanox] Implement low power mode for cmis host management (#17159) (#17693) 2024-01-06 07:55:41 +08:00
Junchao-Mellanox
7368df7839
[Mellanox] Enable CMIS host management (#16846) (#17684)
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled

Co-authored-by: dbarashinvd <105214075+dbarashinvd@users.noreply.github.com>
2024-01-05 12:07:30 -08:00
Junchao-Mellanox
6d43d2f636 [Mellanox] Provide default implementation for sfp error description when CMIS host management is enabled (#17294)
- Why I did it
Provide a dummy implementation for SFP error description when CMIS host management is enabled. A future feature shall be raised to implement SFP error description for such mode.

- How I did it
if SFP is under software control, provide "Not supported" as error description
if SFP is under initialization, provide "Initializing" as error description

- How to verify it
unit test
2024-01-04 10:38:38 +08:00
Nazarii Hnydyn
49e96c3daa
[mellanox]: Disable MFT bash autocompletion. (#17543)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-21 09:45:42 -08:00
Kebo Liu
f96742fb98 [Mellanox] Revert LPM implementation to the old way (#17096)
- Why I did it
The current low power mode setting implementation requests the user to set the port to admin down first before toggling LP mode, this is not backward compatible, now revert it to the old way so that the user can toggle the LP mode regardless of the port admin status.

- How I did it
Revert the recent changes related to LPM in PR #14130 and #16545

- How to verify it
Run all sfputil and SFP platform API related tests on all the Mellanox platforms.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-04 22:14:02 +00:00
Stephen Sun
b93852d53d
[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584)
- Why I did it
Support running hw-management service on MSN4700 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it

- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-11-19 11:03:46 +02:00
Volodymyr Samotiy
672781e24a
[mlnx-fw-upgrade] Add FW reactivation in case 2 FW upgrades were done without reboot (#17092)
- Why I did it
In order to activate FW after it was upgraded need to perform reboot.
If reboot wasn't performed and user need to upgrade to another SONiC image then it will fail.
The reason for that is that during SONiC upgrade new FW should be installed but it will fail because previously installed FW wasn't activated.
In order to allow 2nd FW upgrade without reboot in-between need to reactivate FW image.
This change handles such flow.

Example of issue scenario:

User installed SONiC image on the switch
Then for some reason FW was upgraded by user or script but reboot was not performed to activate it.
After that upgrade to new SONiC image will fail because new image need to install FW but it fails due to previous one wasn't activated.

- How I did it
In "mlnx-fw-upgrade" script check if FW upgrade failed with the error that FW was already installed but reboot was not performed.
If so then perform FW image reactivation and try to upgrade FW again.

- How to verify it
Install SONiC image on the switch
Then upgrade FW but don't perform reboot.
After that upgrade to new SONiC image and check that upgrade was successfull.

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-11-19 11:01:31 +02:00
Junchao-Mellanox
c2edc6f9d5
Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)" (#16956)
This reverts commit 0846322e9a.
2023-10-23 11:55:27 +03:00
Vivek
6410e66f35
[Mellanox] Enhance the processing of Kconfig in the hw-mgmt integration (#16752)
- Why I did it
Add an ability to add arm64 mellanox specific kconfig using the integration tool
Fix the existing duplicate kconfig problem by using the vanilla .config
Add an ability to patch kconfig-inclusions file. Renamed series.patch to external-changes.patch to reflect the behavior
NOTE: Min hw-mgmt version to use with these changes: V.7.0030.2000 not yet upstream but required prio to it.
This option will be enabled one the new hw mgmt will be upstream.

Depends on sonic-net/sonic-linux-kernel#336

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-10-18 19:32:59 +03:00
Junchao-Mellanox
0846322e9a
[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)
- Why I did it
hw-management renamed PSU temperature related sysfs:

psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.

- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold

- How to verify it
Manual test
sonic-mgmt Regression test
2023-10-10 19:21:27 +03:00
Junchao-Mellanox
aedffd333b
[Mellanox] wait reset cause ready (#16722)
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.

How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.

How to verify it
Manual test on master/202211/202205
2023-10-03 18:58:31 -07:00
Vivek
456a90e1ab
[Nvidia] Remove the dependency on python_sdk_api for sfp api (#16545)
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-09-23 00:19:27 -07:00
Junchao-Mellanox
5138afe4e7
[Mellanox] add new platform 2700 a1 (#16515)
- new pcie.yaml
- new sensors.conf
- new thermal support
- new platform.json file
- adjust test code
2023-09-23 00:15:17 -07:00
Saikrishna Arcot
d62ad707bc
Update to Linux 5.10.179 (#15926)
## How I did it

Depends on sonic-net/sonic-linux-kernel#328 and sonic-net/saibcm-modules#12.

#### How to verify it

Verified that the image boots up, BGP comes up, and a basic warm-reboot works on VS, broadcom, and mellanox.
2023-09-20 15:24:39 -07:00
Dror Prital
d7b85af18b
[Mellanox] Update SDK/FW to 4.6.1062/2012.1062 Update SDK/FW/SAI to 4.6.1062/2012.1062/SAIBuild2211.25.1.4 (#16478)
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support

SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
2023-09-07 14:05:33 +03:00
Kebo Liu
e286869b24
[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239)
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems

- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-06 11:32:08 +03:00
Stephen Sun
b5e8c16134
[Mellanox] Enhance FW upgrade mechanism (#16090)
### Why I did it

1. Enhance the diagnosis information collecting mechanism
   - If the option `-v` is fed, it will pass additional diagnosis flags to mlxfwmanager
   - Collect all the output from mlxfwmanager and print them to syslog if it fails
2. Abort syncd in case waiting for device or upgrading firmware fails

Signed-off-by: Stephen Sun <stephens@nvidia.com>

### How I did it

#### How to verify it

Regression and manual test
2023-09-04 11:28:53 -07:00
Vadym Hlushko
78587cedc3
[Mellanox] Remove mlxtrace support for SPC4 (#16373)
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.

- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-04 10:53:20 +03:00
Vadym Hlushko
9e3fdded69
[Mellanox][SFP] Remove unused function parameter (#16318)
Why I did it
To avoid errors when the sfputil show error-status -hw is called from the host OS (not from the pmon docker).

How I did it
Remove the self.sdk_handle parameter from the _get_module_info() function.

How to verify it
Execute the sfputil show error-status -hw

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-01 23:06:04 -07:00
Stephen Sun
be8843b166
Fix issue: unprintable character is rendered when handling comments in j2 (#16287)
Use "{#-" and "-#}" to mark comments in jinja template

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-08-28 15:03:39 +03:00
Junchao-Mellanox
95f317a5e2
[Mellanox] Fix issue: watchdogutil command does not work (#16091)
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
2023-08-23 09:30:58 +03:00
Vadym Hlushko
b214a8a8b6
[Mellanox] Change SDK API sx_mgmt_phy_module_info_get() to sysfs (#15963)
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.

- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs

- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-08-21 20:54:13 +03:00
Kebo Liu
1626e198a8
[Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3

SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

SDK/FW Features
1. On SN2700 all ports can support y cable by credo

SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable 
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile

- How I did it
Update SDK/FW/SAI make files

- How to verify it
Run full sonic-mgmt regression on Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-08-15 15:32:52 +03:00
Kebo Liu
5aa2417c71
[Mellanox] Update MFT to newer version 4.25.0-62 (#16149)
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62

- How I did it
Update the MFT tool make file

- How to verify it
Run full sonic-mgmt regression.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-08-15 09:49:19 +03:00
Junchao-Mellanox
91f3da018e
[Mellanox] Add more unit test coverage for platform API (#15842)
- Why I did it
Increase UT coverage for Nvidia platform API code

Work item tracking
Microsoft ADO (number only):

- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py

- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
2023-08-03 13:54:31 +03:00
Kebo Liu
380898f3a1
[Mellanox] Remove unnecessary file manipulation in the SAI Make file (#15993)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-08-03 13:39:27 +03:00
Vadym Hlushko
521a86b2de
[Mellanox] Add mlxtrace to techsupport (#15961)
- Why I did it
Added the fwtrace config files in order to be able to call the mlxstrace utility during the show techsupport dump.

Work item tracking
Microsoft ADO (number only):

- How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.

- How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-08-03 11:36:58 +03:00
Kebo Liu
b6986ffd68
[Mellanox] Update SAI build procedure (#15728)
= Why I did it
To optimize Mellanox platform SAI build

- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.

- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
2023-07-15 01:03:33 +03:00
Junchao-Mellanox
ed21266ff4
[Mellanox] Remove reset_from_comex from reboot cause mapping (#15793)
- Why I did it
The reset cause "reset_from_comex" has been removed by hw-management, hence removing it from platform API code

- How I did it
Remove reset_from_comex from reboot cause mapping

- How to verify it
Manual test
2023-07-15 01:02:46 +03:00