- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:
Jan 8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist
Jan 8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist
- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':
Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.392544 sonic ERR pmon#xcvrd: Traceback (most recent call last):
Nov 27 09:47:16.392643 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1518, in run
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: self.task_worker()
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1240, in task_worker
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: sfp = platform_chassis.get_sfp(pport)
Nov 27 09:47:16.392793 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 346, in get_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self.initialize_single_sfp(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 288, in initialize_single_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self._sfp_list[index] = sfp_module.SFP(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/sfp.py", line 272, in __init__
Nov 27 09:47:16.392866 sonic ERR pmon#xcvrd: from .thermal import initialize_sfp_thermal
Nov 27 09:47:16.392918 sonic ERR pmon#xcvrd: ImportError: cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Exiting main loop as child thread raised exception!
Work item tracking
Microsoft ADO (number only):
How I did it
Add lock for creating SFP object
How to verify it
UNIT TEST
Manual Test
Why I did it
Update SAI version to SAIBuild2305.26.0.16
Update SDK/FW to 4.6.2134/2012.2134
Fixed issues:
Updated SN3700C to enable limit to 100G speed.
Recovering from Low power mode might ends with port down.
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in makefile
How to verify it
Confirm issues fixed and run sonic-mgmt tests
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.
Why I did it
To overcome SN2700 20 sec delay on login
Work item tracking
N/A
How I did it
Removed MFT bash autocompletion part
How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Why I did it
Update SAI to SAIBuild2305.26.0.9 for Mellanox platforms.
Fixed issues:
When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
Work item tracking
Microsoft ADO (number only):
How I did it
Updated SAI version in "mlnx-sai.mk" Makefile.
How to verify it
Run "sonic-mgmt" regression testing.
Why I did it
Update SAI version to SAIBuild2305.26.0.0
New features
FDB entries are now restored after warmboot to prevent temporary system flooding.
Update SDK/FW to 4.6.2102/2012.2102
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in make file.
How to verify it
Running sonic-mgmt regression.
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support
SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
platform/mellanox/mlnx-sai.mk
* Fix issue: unprintable character is rendered when handling comments in j2
Use "{#-" and "-#}" to mark comments in jinja template
Signed-off-by: Stephen Sun <stephens@nvidia.com>
---------
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
= Why I did it
To optimize Mellanox platform SAI build
- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.
- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.
- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs
- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
- Why I did it
Increase UT coverage for Nvidia platform API code
Work item tracking
Microsoft ADO (number only):
- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py
- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.
NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker
How I did it
Implemented new service to clean the shared storage.
How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
- Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.
- How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.
- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
- Why I did it
Adjust the warning threshold implementation according to the latest algorithm update
- How I did it
Modify power warning and critical thresholds methods
- How to verify it
Unit test updated to cover the change
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Add the patchwork link to the commit description for non-upstream patches if present
- How I did it
Parse the patchwork/<patch_name>.txt file from hw-mgmt
- Why I did it
Bug fix:
- * I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.
- How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer
-How to verify it
Run full sonic-mgmt regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
Facilitate Automatic integration of sdk kernel patches into SONiC.
**Inputs to the Script:**
1) `MLNX_SDK_VERSION` Eg: `4.5.4206`
2) `MLNX_SDK_ISSU_VERSION` Eg: `101`
**Note: If nothing is provided the one already present in the sdk.mk file is used**
3) `MLNX_SDK_SOURCE_BASE_URL:`
**Note: If nothing is provided the upstream sdk drivers url is used**
4) `CREATE_BRANCH: (y|n)` Creates a branch instead of a commit (optional, default: n)
5) `BRANCH_SONIC`: Only relevant when CREATE_BRANCH is y. `Default: master`.
Note: These should be provided through `SONIC_OVERRIDE_BUILD_VARS ` parameter
**Output:**
1) Script creates a commit in sonic-linux-kernel with any updates to sdk-kernel patches in sonic in accordance with the version provided by `MLNX_SDK_VERSION`
**Note: Script Doesn't commit anything to linux-kernel when there aren't any changes required..**
#### How I did it
1) Added a new make target which can be invoked by calling `make integrate-mlnx-sdk`
```
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6f38dca_integrate_4.5.4206
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 1
d64d1e7 (HEAD -> master_6f38dca_integrate_4.5.4206) Intgerate MLNX SDK 4.5.4206 Kernel Patches
```
Changes made will be summarized under `sonic-buildimage/integrate-mlnx-sdk_user.out` file. Debugging and troubleshooting output is written to `sonic-buildimage/integrate-mlnx-sdk.log` files
[log_files.zip](https://github.com/sonic-net/sonic-buildimage/files/11226441/log_files.zip)
#### Limitations:
1) Assumes that the sdk kernel patches are always upstreamed
#### How to verify it
Build the Kernel and test
Why I did it
Align with SAI headers v1.12.0
Work item tracking
Microsoft ADO (number only):
How I did it
Update Mellanox SAI submodule
How to verify it
Compile SONiC image
- Why I did it
The current implementation of SFP reset, LPM, present relies on SDK API. This PR moves the implementation to SDK sysfs. By this PR, it gains following benefit:
1. SDK sysfs provides better performance.
2. Host side and container side share the same code.
3. Code is much cleaner.
- How I did it
Use SDK sysfs to implement SFP reset, LPM, present.
- How to verify it
1. Manual test.
2. Unit test.
SDK/FW Fixed Issues:
• When a system has more than 256 ACL entries, on rare occasion, removing/adding entries may cause some ACL entries not to work.
• When using mirror session policer on spectrum-2, spectrum-3, the actual CIR was 1.28 times more than the configured CIR value
• After warm boot process, when enabling ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
• Warm boot might fail if the key value SAI_KEY_ACCUMULATED_FLOW_COUNTER_UNITS_IN_KB is set
• If counters are bound to an next hop group, there is a probability the next API calls that modify the next-hop group members will fail.
• In Spectrum platforms Fastboot mode is not operational for Split port with Force mode in 50G speed
• When fine grain next hop group has a size of 2K or 4K members, and group is removed, FW will remove only (size % 2048) members, resulting in leakage of KVD resources
• When reading some port statistics, or bulk reading some Queue or PG statistics, and in parallel reading or writing other counters, FW may, in rare cases, get stuck
• SN2201 Module 1 is considered to be present/linked while no cable/module is plugged
• On Spectrum-3 when port configure to 400G FW might stuck after running mlxlink while 400G interface connected and swap between upper and lower 4 lanes
SAI New features:
• ACL: Added support for an ACL match on the AETH field (SAI_ACL_TABLE_ATTR_FIELD_AETH_SYNDROME, SAI_ACL_ENTRY_ATTR_FIELD_AETH_SYNDROME) to count RoCE NAK and CNP packets.
• PLL Status: Added a new logging entry that alerts the user upon a PLL lock loss event.
• Dual ToR - Additional MAC Address: Added support for setting a MAC address for the router interface which is not part of the 10 bit MAC address available for RIFs on Spectrum-1, as part of the Dual ToR scenario.
• Dual ToR: DSCP Remapping Added support for tunnel QoS maps as part of the Dual TOR scenario.
SAI Fixed issues:
• When setting a WRED profile attribute for a color that was not enabled during the profile create time, an error would be returned. After the fix, a default profile is create on such scenario and the set attribute is applied on top of it
• When calling the flush FDB by using the SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID attribute, the bridge bv_id value was filled on the notification callback where it should have been left empty.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
As a LED indicator to help user to find switch location in the lab, UID LED is a useful LED in Mellanox switch.
- How I did it
I add a new member _led_uid in Mellanox/Chassis.py, and extend Mellanox/led.py to support blue color.
Relevant platform-common PR sonic-net/sonic-platform-common#369
- How to verify it
Add unit test cases in test.py, and do manual test including turn-on/off/show uid led.
Signed-off-by: David Xia <daxia@nvidia.com>
- Why I did it
Add the commit-id patch map in the commit message.
- How I did it
By parsing the patch DB from hw-mgmt
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Currently, LED sysfs path is hardcoded. We will need change LED code if new LED color is supported for new platforms. This PR is aimed to improve this. By this PR, LED sysfs path is deduced from LED capability file.
- How I did it
Improve LED management on Nvidia platform:
get LED capability from capability file and deduce sysfs name according to the capability
- How to verify it
Unit test
Manual test
- Why I did it
To improve ASAN backtrace output when the call stack contains a code that is not compiled with -fno-omit-frame-pointer.
- How I did it
Added fast_unwind_on_malloc=0 to the ASAN_OPTIONS
- How to verify it
Build and test docker-syncd-mlnx.gz with ENABLE_ASAN=y
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
Add NVIDIA Copyright header to NVIDIA files added lately
- How I did it
Add NVIDIA Copyright header for the relevant files
- How to verify it
N/A (only commented text was added).
- Why I did it
Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2
- How I did it
Download iproute2 from Debian repository, apply patches and compile to create a new target.
The target is then deployed in syncd container of Mellanox switches only.
The new target is called IPROUTE2_MLNX.
- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
- Why I did it
Facilitate Automatic integration of new hw-mgmt version into SONiC.
Inputs to the Script:
MLNX_HW_MANAGEMENT_VERSION Eg: 7.0040.5202
CREATE_BRANCH: (y|n) Creates a branch instead of a commit (optional, default: n)
BRANCH_SONIC: Only relevant when CREATE_BRANCH is y. Default: master.
Note: These should be provided through SONIC_OVERRIDE_BUILD_VARS parameter
Output:
Script creates a commit (in each of sonic-buildimage, sonic-linux-kernel) with all the changes required for upgrading the hw-management version to a version provided by MLNX_HW_MANAGEMENT_VERSION
Brief Summary of the changes made:
MLNX_HW_MANAGEMENT_VERSION flag in the hw-management.mk file
hw-mgmt submodule is updated to the corresponding version
Updates are made to non-upstream-patches/patches and series.patch file
series, kconfig-inclusion and kconfig-exclusion files can be updated in the sonic-linux-kernel repo
sonic-linux-kernel/patches folder is updated with the corresponding upstream patches
Based on the inputs, there could be a branch seen in the local for each of the repo's. Branch is named as <branch>_<parent_commit>_integrate_<hw_mgmt_version>
- How I did it
Added a new make target which can be invoked by calling make integrate-mlnx-hw-mgmt
user@server:/sonic-buildimage$ git rev-parse --abbrev-ref HEAD
master_23193446a_integrate_7.0020.5052
user@server:/sonic-buildimage$ git log --oneline -n 2
f66e01867 (HEAD -> master_23193446a_integrate_V.7.0020.5052, show) Intgerate HW-MGMT V.7.0020.5052 Changes
23193446a (master_intg_hw_mgmt) Update logic
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6847319_integrate_7.0020.4104
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 2
6094f71 (HEAD -> master_6847319_integrate_V.7.0020.5052) Intgerate HW-MGMT V.7.0020.5052 Changes
6847319 (origin/master, origin/HEAD) Read ID register for optoe1 to find pageable bit in optoe driver (#308)
Changes made will be summarized under sonic-buildimage/integrate-mlnx-hw-mgmt_user.out file. Debugging and troubleshooting output is written to sonic-buildimage/integrate-mlnx-hw-mgmt.log files
User output file & stdout file:
log_files.tar.gz
Limitations:
Assumes the changes would only work for amd64
Assumes the non-upstream patches in mellanox only belong to hw-mgmt
- How to verify it
Build the Kernel
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Currently, non upstream patches are applied only after upstream patches.
Depends on sonic-net/sonic-linux-kernel#313. Can be merged in any order, preferably together
- What I did it
Non upstream Patches that reside in the sonic repo will not be saved in a tar file bur rather in a folder pointed out by EXTERNAL_KERNEL_PATCH_LOC. This is to make changes to the non upstream patches easily traceable.
The build variable name is also updated to INCLUDE_EXTERNAL_PATCHES
Files/folders expected under EXTERNAL_KERNEL_PATCH_LOC
EXTERNAL_KERNEL_PATCH_LOC/
├──── patches/
├── 0001-xxxxx.patch
├── 0001-yyyyyyyy.patch
├── .............
├──── series.patch
series.patch should contain a diff that is applied on the sonic-linux-kernel/patch/series file. The diff should include all the non-upstream patches.
How to verify it
Build the Kernel and verified if all the patches are applied properly
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready
- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.
- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE