Commit Graph

581 Commits

Author SHA1 Message Date
mssonicbld
da90d5624d
[Mellanox] Enhance the processing of Kconfig in the hw-mgmt integration (#16752) (#17009) 2023-10-26 00:46:54 +08:00
mssonicbld
c3ea44a522
[Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
Junchao-Mellanox
648c94dd59 [Mellanox] wait reset cause ready (#16722)
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.

How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.

How to verify it
Manual test on master/202211/202205
2023-10-04 14:34:30 +08:00
Vivek
11e9f7c0de [Nvidia] Remove the dependency on python_sdk_api for sfp api (#16545)
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-09-27 18:33:34 +08:00
Dror Prital
81b5361b2a [Mellanox] Update SDK/FW to 4.6.1062/2012.1062 Update SDK/FW/SAI to 4.6.1062/2012.1062/SAIBuild2211.25.1.4 (#16478)
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support

SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
2023-09-22 14:34:08 +08:00
mssonicbld
a713299614
[Mellanox] Remove mlxtrace support for SPC4 (#16373) (#16625)
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.

- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
2023-09-21 20:38:22 +08:00
Kebo Liu
27f15d40e1 [Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239)
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems

- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-21 18:34:07 +08:00
Kebo Liu
fe7eeed051
[202305][Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3(#16096) (#16298)
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)

SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3

SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.

SDK/FW Features
1. On SN2700 all ports can support y cable by credo

SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE

SAI features
1. Port init profile

- How I did it
Update SDK/FW/SAI make files

- How to verify it
Run full sonic-mgmt regression on Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
	platform/mellanox/mlnx-sai.mk

* Fix issue: unprintable character is rendered when handling comments in j2

Use "{#-" and "-#}" to mark comments in jinja template

Signed-off-by: Stephen Sun <stephens@nvidia.com>

---------

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
2023-09-10 22:28:46 +08:00
mssonicbld
63801d5bf7
[Mellanox][SFP] Remove unused function parameter (#16318) (#16424) 2023-09-03 22:15:30 +08:00
Junchao-Mellanox
d13341fd9b [Mellanox] Fix issue: watchdogutil command does not work (#16091)
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
2023-09-03 20:44:36 +08:00
Kebo Liu
9d4d3af5e6 [Mellanox] Update MFT to newer version 4.25.0-62 (#16149)
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62

- How I did it
Update the MFT tool make file

- How to verify it
Run full sonic-mgmt regression.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-09-03 20:44:29 +08:00
Kebo Liu
8253fd5c07 [Mellanox] Update SAI build procedure (#15728)
= Why I did it
To optimize Mellanox platform SAI build

- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.

- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
2023-09-03 18:32:47 +08:00
Vadym Hlushko
1f1ae60961 [Mellanox] Change SDK API sx_mgmt_phy_module_info_get() to sysfs (#15963)
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.

- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs

- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-09-03 16:33:10 +08:00
mssonicbld
875b81e407
[Mellanox] Add mlxtrace to techsupport (#15961) (#16215) 2023-08-20 23:51:38 +08:00
mssonicbld
75b7ec361c
[Mellanox] Add more unit test coverage for platform API (#15842) (#16137)
- Why I did it
Increase UT coverage for Nvidia platform API code

Work item tracking
Microsoft ADO (number only):

- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py

- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-08-14 22:40:38 +08:00
mssonicbld
33a10b479a
[nvidia] make sure shared storage with syncd is cleared on restarts (#14547) (#16046)
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.

NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker

How I did it
Implemented new service to clean the shared storage.

How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2023-08-07 09:27:43 +08:00
mssonicbld
157b9ea3b7
[Mellanox] Remove unnecessary file manipulation in the SAI Make file (#15993) (#16043) 2023-08-06 17:18:21 +08:00
mssonicbld
89fdba9e92
[Mellanox] Remove reset_from_comex from reboot cause mapping (#15793) (#16040) 2023-08-06 17:04:26 +08:00
mssonicbld
298e7ebe34
[Mellanox] Add support for BIOS update on Spectrum-4 (#15795) (#15942) 2023-07-24 02:08:20 +08:00
Stepan Blyshchak
e2e5b77f16
[mlnx-ffb.sh] Update issu-version location (#14925)
#### Why I did it

ISSU version check fails due to inability to mount squashfs from 202211 on 201911

#### How I did it

Put ISSU version file under platform directory

#### How to verify it

Warm-upgrade matrix:
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to master
- 201911 (with https://github.com/sonic-net/sonic-buildimage/pull/14928) to 202211
- 202012 (with https://github.com/sonic-net/sonic-buildimage/pull/14927) to master
- 202205 (with this change cherry-picked) to master
2023-06-15 15:14:52 -07:00
Lior Avramov
c05d017091
[Mellanox] Remove iproute2 SDK patches from SONiC tree and consume them from SDK github (#15062)
- Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.

- How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.

- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-06-13 15:17:52 +03:00
Stephen Sun
238e6ffcc1
[Mellanox] Adjust warning threshold implementation according to the latest algorithm update (#15092)
- Why I did it
Adjust the warning threshold implementation according to the latest algorithm update

- How I did it
Modify power warning and critical thresholds methods

- How to verify it
Unit test updated to cover the change

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-06-13 15:14:10 +03:00
Vivek
9d8ab1b8e4
[Mellanox] Added patchwork link to commit message (#15301)
- Why I did it
Add the patchwork link to the commit description for non-upstream patches if present

- How I did it
Parse the patchwork/<patch_name>.txt file from hw-mgmt
2023-06-08 18:51:58 +03:00
Kebo Liu
5bb3326d2b
[Mellanox] Update hw-mgmt to 7.0020.4301 (#15260)
- Why I did it
Bug fix:

- * I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.

- How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer

-How to verify it
Run full sonic-mgmt regression

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-05-31 10:33:08 +03:00
Vivek
6852fcdc24
[Mellanox] Facilitate automatic integration of sdk kernel patches (#14652)
#### Why I did it

Facilitate Automatic integration of sdk kernel patches into SONiC. 

**Inputs to the Script:**
1) `MLNX_SDK_VERSION` Eg: `4.5.4206`
2) `MLNX_SDK_ISSU_VERSION` Eg: `101` 
 **Note: If nothing is provided the one already present in the sdk.mk file is used**
3) `MLNX_SDK_SOURCE_BASE_URL:` 
 **Note: If nothing is provided the upstream sdk drivers url is used**
4) `CREATE_BRANCH: (y|n)` Creates a branch instead of a commit (optional, default: n) 
5) `BRANCH_SONIC`:  Only relevant when CREATE_BRANCH is y. `Default: master`. 

Note: These should be provided through `SONIC_OVERRIDE_BUILD_VARS ` parameter

**Output:**
1) Script creates a commit in sonic-linux-kernel with any updates to sdk-kernel patches in sonic in accordance with the version provided by  `MLNX_SDK_VERSION`

**Note: Script Doesn't commit anything to linux-kernel when there aren't any changes required..**  

#### How I did it

1) Added a new make target which can be invoked by calling `make integrate-mlnx-sdk`

```
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6f38dca_integrate_4.5.4206

user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 1
d64d1e7 (HEAD -> master_6f38dca_integrate_4.5.4206) Intgerate MLNX SDK 4.5.4206 Kernel Patches
```

Changes made will be summarized under `sonic-buildimage/integrate-mlnx-sdk_user.out` file. Debugging and troubleshooting output is written to `sonic-buildimage/integrate-mlnx-sdk.log` files

[log_files.zip](https://github.com/sonic-net/sonic-buildimage/files/11226441/log_files.zip)


#### Limitations:
1) Assumes that the sdk kernel patches are always upstreamed

#### How to verify it

Build the Kernel and test
2023-05-29 22:24:06 -07:00
Oleksandr Ivantsiv
f3ce9ebda8
[Mellanox] Update SAI to v2305.24.0.1 (#15208)
Why I did it
Align with SAI headers v1.12.0

Work item tracking
Microsoft ADO (number only):
How I did it
Update Mellanox SAI submodule

How to verify it
Compile SONiC image
2023-05-26 17:53:17 +08:00
Vivek
d3f2d06117
[Mellanox] Add Copyright Headers for missing files (#15136)
Added NVIDIA copyright to missing files under platform/mellanox & device/mellanox
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-25 07:55:44 +03:00
Junchao-Mellanox
18cf719d6a
[Mellanox] Use sysfs for sfp reset/LPM/presence (#14130)
- Why I did it
The current implementation of SFP reset, LPM, present relies on SDK API. This PR moves the implementation to SDK sysfs. By this PR, it gains following benefit:
1. SDK sysfs provides better performance.
2. Host side and container side share the same code.
3. Code is much cleaner.

- How I did it
Use SDK sysfs to implement SFP reset, LPM, present.

- How to verify it
1. Manual test.
2. Unit test.
2023-05-24 17:24:34 +03:00
Kebo Liu
3e9437b63e
[Mellanox] Update SAI to 2211.24.0.21 and SDK/FW to 4.5.5142/2010_5144 (#15072)
SDK/FW Fixed Issues:
• When a system has more than 256 ACL entries, on rare occasion, removing/adding entries may cause some ACL entries not to work.
• When using mirror session policer on spectrum-2, spectrum-3, the actual CIR was 1.28 times more than the configured CIR value
• After warm boot process, when enabling ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
• Warm boot might fail if the key value SAI_KEY_ACCUMULATED_FLOW_COUNTER_UNITS_IN_KB is set
• If counters are bound to an next hop group, there is a probability the next API calls that modify the next-hop group members will fail.
• In Spectrum platforms Fastboot mode is not operational for Split port with Force mode in 50G speed
• When fine grain next hop group has a size of 2K or 4K members, and group is removed, FW will remove only (size % 2048) members, resulting in leakage of KVD resources
• When reading some port statistics, or bulk reading some Queue or PG statistics, and in parallel reading or writing other counters, FW may, in rare cases, get stuck
• SN2201 Module 1 is considered to be present/linked while no cable/module is plugged
• On Spectrum-3 when port configure to 400G FW might stuck after running mlxlink while 400G interface connected and swap between upper and lower 4 lanes

SAI New features:
• ACL: Added support for an ACL match on the AETH field (SAI_ACL_TABLE_ATTR_FIELD_AETH_SYNDROME, SAI_ACL_ENTRY_ATTR_FIELD_AETH_SYNDROME) to count RoCE NAK and CNP packets.
• PLL Status: Added a new logging entry that alerts the user upon a PLL lock loss event.
• Dual ToR - Additional MAC Address: Added support for setting a MAC address for the router interface which is not part of the 10 bit MAC address available for RIFs on Spectrum-1, as part of the Dual ToR scenario.
• Dual ToR: DSCP Remapping Added support for tunnel QoS maps as part of the Dual TOR scenario.

SAI Fixed issues:
• When setting a WRED profile attribute for a color that was not enabled during the profile create time, an error would be returned. After the fix, a default profile is create on such scenario and the set attribute is applied on top of it
• When calling the flush FDB by using the SAI_FDB_FLUSH_ATTR_BRIDGE_PORT_ID attribute, the bridge bv_id value was filled on the notification callback where it should have been left empty.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-05-24 17:20:33 +03:00
daxia16
1175143af1
[Mellanox] Support UID LED in platform API (#11592)
- Why I did it
As a LED indicator to help user to find switch location in the lab, UID LED is a useful LED in Mellanox switch.

- How I did it
I add a new member _led_uid in Mellanox/Chassis.py, and extend Mellanox/led.py to support blue color.
Relevant platform-common PR sonic-net/sonic-platform-common#369

- How to verify it
Add unit test cases in test.py, and do manual test including turn-on/off/show uid led.

Signed-off-by: David Xia <daxia@nvidia.com>
2023-05-16 08:24:39 +03:00
Junchao-Mellanox
7962a5c0fa
[Mellanox] add PSU fan direction support (#14508)
- Why I did it
Add PSU fan direction support

- How I did it
Implement fan.get_direction for PSU fan

- How to verify it
Manual test
Unit test
2023-05-15 21:34:54 +03:00
Vivek
bc58c12ed8
[Mellanox] Add patch commit-id mapping to description (#15052)
- Why I did it
Add the commit-id patch map in the commit message.

- How I did it
By parsing the patch DB from hw-mgmt

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-15 13:47:24 +03:00
Junchao-Mellanox
9deca05f9d
[Mellanox] get LED capability from capability file (#14584)
- Why I did it
Currently, LED sysfs path is hardcoded. We will need change LED code if new LED color is supported for new platforms. This PR is aimed to improve this. By this PR, LED sysfs path is deduced from LED capability file.

- How I did it
Improve LED management on Nvidia platform:
get LED capability from capability file and deduce sysfs name according to the capability

- How to verify it
Unit test
Manual test
2023-05-10 20:53:50 +03:00
Yakiv Huryk
fa02411750
[Mellanox][asan] disable fast_unwind_on_malloc for mlnx syncd (#14858)
- Why I did it
To improve ASAN backtrace output when the call stack contains a code that is not compiled with -fno-omit-frame-pointer.

- How I did it
Added fast_unwind_on_malloc=0 to the ASAN_OPTIONS

- How to verify it
Build and test docker-syncd-mlnx.gz with ENABLE_ASAN=y

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-05-10 20:50:42 +03:00
Lior Avramov
97cdb6af5c
[Mellanox] Add copyright header to ECMP calculator files (#14825)
- Why I did it
Add NVIDIA Copyright header to NVIDIA files added lately

- How I did it
Add NVIDIA Copyright header for the relevant files

- How to verify it
N/A (only commented text was added).
2023-05-02 10:35:16 +03:00
Lior Avramov
2922f26b6c
[Mellanox] Replace iproute2 supplied by SDK to iproute2 downloaded from Debian repository (#14726)
- Why I did it
Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2

- How I did it
Download iproute2 from Debian repository, apply patches and compile to create a new target.
The target is then deployed in syncd container of Mellanox switches only.
The new target is called IPROUTE2_MLNX.

- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-04-30 12:30:09 +03:00
Vivek
1b63543e7f
[Mellanox] Fix the hw-mgmt intg tool case sensitivity for KConfig (#14709)
Fix the script to consider case sensitivity while writing the kconfig

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-04-25 09:17:02 +03:00
Vivek
397908aa59
[Mellanox] Facilitate automatic integration of new hw-mgmt (#14594)
- Why I did it
Facilitate Automatic integration of new hw-mgmt version into SONiC.

Inputs to the Script:

MLNX_HW_MANAGEMENT_VERSION Eg: 7.0040.5202
CREATE_BRANCH: (y|n) Creates a branch instead of a commit (optional, default: n)
BRANCH_SONIC: Only relevant when CREATE_BRANCH is y. Default: master.
Note: These should be provided through SONIC_OVERRIDE_BUILD_VARS  parameter

Output:

Script creates a commit (in each of sonic-buildimage, sonic-linux-kernel) with all the changes required for upgrading the hw-management version to a version provided by MLNX_HW_MANAGEMENT_VERSION
Brief Summary of the changes made:

MLNX_HW_MANAGEMENT_VERSION flag in the hw-management.mk file
hw-mgmt submodule is updated to the corresponding version
Updates are made to non-upstream-patches/patches and series.patch file
series, kconfig-inclusion and kconfig-exclusion files can be updated in the sonic-linux-kernel repo
sonic-linux-kernel/patches folder is updated with the corresponding upstream patches
Based on the inputs, there could be a branch seen in the local for each of the repo's. Branch is named as <branch>_<parent_commit>_integrate_<hw_mgmt_version>

- How I did it
Added a new make target which can be invoked by calling make integrate-mlnx-hw-mgmt
user@server:/sonic-buildimage$ git rev-parse --abbrev-ref HEAD
master_23193446a_integrate_7.0020.5052
user@server:/sonic-buildimage$ git log --oneline -n 2
f66e01867 (HEAD -> master_23193446a_integrate_V.7.0020.5052, show) Intgerate HW-MGMT V.7.0020.5052 Changes
23193446a (master_intg_hw_mgmt) Update logic

user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6847319_integrate_7.0020.4104
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 2
6094f71 (HEAD -> master_6847319_integrate_V.7.0020.5052) Intgerate HW-MGMT V.7.0020.5052 Changes
6847319 (origin/master, origin/HEAD) Read ID register for optoe1 to find pageable bit in optoe driver  (#308)
Changes made will be summarized under sonic-buildimage/integrate-mlnx-hw-mgmt_user.out file. Debugging and troubleshooting output is written to sonic-buildimage/integrate-mlnx-hw-mgmt.log files

User output file & stdout file:

log_files.tar.gz

Limitations:
Assumes the changes would only work for amd64
Assumes the non-upstream patches in mellanox only belong to hw-mgmt

- How to verify it
Build the Kernel

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-04-13 14:18:09 +03:00
Vivek
0df155b014
Made non-upstream patch design order aware (#14434)
- Why I did it

Currently, non upstream patches are applied only after upstream patches.

Depends on sonic-net/sonic-linux-kernel#313. Can be merged in any order, preferably together

- What I did it

Non upstream Patches that reside in the sonic repo will not be saved in a tar file bur rather in a folder pointed out by EXTERNAL_KERNEL_PATCH_LOC. This is to make changes to the non upstream patches easily traceable.
The build variable name is also updated to INCLUDE_EXTERNAL_PATCHES
Files/folders expected under EXTERNAL_KERNEL_PATCH_LOC
EXTERNAL_KERNEL_PATCH_LOC/
       ├──── patches/
             ├── 0001-xxxxx.patch
             ├── 0001-yyyyyyyy.patch
             ├── .............
       ├──── series.patch
series.patch should contain a diff that is applied on the sonic-linux-kernel/patch/series file. The diff should include all the non-upstream patches.
How to verify it

Build the Kernel and verified if all the patches are applied properly

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-04-10 19:48:27 +03:00
dbarashinvd
06d6dafcf3
[Mellanox] fix for watchdog device not found, adding dependency on hw-management (#14182)
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready

- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.

- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE
2023-03-15 18:36:20 +02:00
Dror Prital
35f8101b50
Update SDK/FW to version 4.5.4206/4.5.4204 (#14164)
- Why I did it
To include latest fixes:

Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog

- How I did it
Updated SDK submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".
2023-03-14 16:27:38 +02:00
Sudharsan Dhamal Gopalarathnam
8d82a86134
[Mellanox]Fix lpmode set when logical port is larger than 64 (#14138)
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below

Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-09 00:02:55 +02:00
Volodymyr Samotiy
15bee3a2c0
[Mellanox] Update MFT to 4.22.1-15 (#14133)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-03-08 10:21:58 +02:00
Stepan Blyshchak
f908dfe919
[Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-06 13:36:43 +02:00
Yakiv Huryk
fffeec6ecc
[Mellanox] update sdk/fw build procedure (#14025)
- Why I did it
To optimize Mellanox platform build

- How I did it
sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release
sx kernel is downloaded as zip from Spectrum-SDK-Drivers

- How to verify it
configure/build for Mellanox platform

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2023-03-02 09:48:47 +02:00
DavidZagury
ee1b6b3751
Remove support to Mellanox SPC4 ASIC (#13932)
- Why I did it
FW for Spectrum-4 ASIC not yet available

- How I did it
Remove in Mellanox fw make files to Spectrum-4 ASIC firmware binaries.
Remove from firmware upgrade scripts to be able Spectrum-4 ASIC.

- How to verify it
Run regression test
2023-02-23 08:25:34 +02:00
Junchao-Mellanox
f6d3615bb9
[Mellanox] Check system eeprom existence in a retry manner (#13884)
- Why I did it
On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it.

- How I did it
Waiting EEPROM creation in platform API up to 10 seconds.

- How to verify it
Manual test
2023-02-21 19:40:16 +02:00
Lior Avramov
fd122efb40
[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814)
- Why I did it
Add support for systems 4600/4600C/2201 that are using sonic interface names aligned to 4 instead of 8 (which is the max number of lanes per port).
Improve DB access calls, now we use Python library functions.

- How I did it
Use addition information taken from Config DB in order to create map from SDK logical index to sonic interface name.

- How to verify it
Run ECMP calculator on 4600, 4600C and 2201 platforms.
2023-02-21 08:52:51 +02:00
Junchao-Mellanox
331b97e2aa
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710)
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.

- How I did it
Don't use hardcoded 64, get logical port number instead.

- How to verify it
Manual test
2023-02-21 08:14:29 +02:00
Junchao-Mellanox
cac95c2b4a
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847)
- Why I did it
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API to avoid issue like:

19:34:11  ImportError while loading conftest '/sonic/platform/mellanox/mlnx-platform-api/tests/conftest.py'.
19:34:11  tests/conftest.py:28: in <module>
19:34:11      from sonic_platform import utils
19:34:11  sonic_platform/__init__.py:18: in <module>
19:34:11      from sonic_platform import *
19:34:11  sonic_platform/platform.py:28: in <module>
19:34:11      raise ImportError(str(e) + "- required module not found")
19:34:11  E   ImportError: No module named 'swsscommon'- required module not found- required module not found
19:34:11  [  FAIL LOG END  ] [ target/python-wheels/bullseye/mlnx_platform_api-1.0-py3-none-any.whl ]
The issue only happens when calling below command:

make target/python-wheels/bullseye/mlnx_platform_api-1.0-py3-none-any.whl 

- How I did it
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API

- How to verify it
Run build
2023-02-20 15:24:26 +02:00