Commit Graph

59 Commits

Author SHA1 Message Date
Kebo Liu
5bb3326d2b
[Mellanox] Update hw-mgmt to 7.0020.4301 (#15260)
- Why I did it
Bug fix:

- * I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.

- How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer

-How to verify it
Run full sonic-mgmt regression

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-05-31 10:33:08 +03:00
Vivek
397908aa59
[Mellanox] Facilitate automatic integration of new hw-mgmt (#14594)
- Why I did it
Facilitate Automatic integration of new hw-mgmt version into SONiC.

Inputs to the Script:

MLNX_HW_MANAGEMENT_VERSION Eg: 7.0040.5202
CREATE_BRANCH: (y|n) Creates a branch instead of a commit (optional, default: n)
BRANCH_SONIC: Only relevant when CREATE_BRANCH is y. Default: master.
Note: These should be provided through SONIC_OVERRIDE_BUILD_VARS  parameter

Output:

Script creates a commit (in each of sonic-buildimage, sonic-linux-kernel) with all the changes required for upgrading the hw-management version to a version provided by MLNX_HW_MANAGEMENT_VERSION
Brief Summary of the changes made:

MLNX_HW_MANAGEMENT_VERSION flag in the hw-management.mk file
hw-mgmt submodule is updated to the corresponding version
Updates are made to non-upstream-patches/patches and series.patch file
series, kconfig-inclusion and kconfig-exclusion files can be updated in the sonic-linux-kernel repo
sonic-linux-kernel/patches folder is updated with the corresponding upstream patches
Based on the inputs, there could be a branch seen in the local for each of the repo's. Branch is named as <branch>_<parent_commit>_integrate_<hw_mgmt_version>

- How I did it
Added a new make target which can be invoked by calling make integrate-mlnx-hw-mgmt
user@server:/sonic-buildimage$ git rev-parse --abbrev-ref HEAD
master_23193446a_integrate_7.0020.5052
user@server:/sonic-buildimage$ git log --oneline -n 2
f66e01867 (HEAD -> master_23193446a_integrate_V.7.0020.5052, show) Intgerate HW-MGMT V.7.0020.5052 Changes
23193446a (master_intg_hw_mgmt) Update logic

user@server:/sonic-buildimage/src/sonic-linux-kernel$ git rev-parse --abbrev-ref HEAD
master_6847319_integrate_7.0020.4104
user@server:/sonic-buildimage/src/sonic-linux-kernel$ git log --oneline -n 2
6094f71 (HEAD -> master_6847319_integrate_V.7.0020.5052) Intgerate HW-MGMT V.7.0020.5052 Changes
6847319 (origin/master, origin/HEAD) Read ID register for optoe1 to find pageable bit in optoe driver  (#308)
Changes made will be summarized under sonic-buildimage/integrate-mlnx-hw-mgmt_user.out file. Debugging and troubleshooting output is written to sonic-buildimage/integrate-mlnx-hw-mgmt.log files

User output file & stdout file:

log_files.tar.gz

Limitations:
Assumes the changes would only work for amd64
Assumes the non-upstream patches in mellanox only belong to hw-mgmt

- How to verify it
Build the Kernel

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-04-13 14:18:09 +03:00
dbarashinvd
06d6dafcf3
[Mellanox] fix for watchdog device not found, adding dependency on hw-management (#14182)
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready

- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.

- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE
2023-03-15 18:36:20 +02:00
Stephen Sun
3112997b5a
[Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372)
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305

- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service 

- How to verify it
Regression test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-12 11:23:47 +02:00
Kebo Liu
0031cc6c64
[Mellanox] Update hw-mgmt package to V.7.0020.3006 (#11538)
- Why I did it
Update HW-MGMT to V.7.0020.3006
1. Support new system SN2201
2. Add COMEX BRDWL respin support

- How I did it
Update the version number of the makefile
Advance the hw-mgmt submodule pointer

- How to verify it
Run full regression on Nvidia platforms

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-08-11 08:59:26 +03:00
Alexander Allen
bec35df04a
[Mellanox] Add arm64 architecture support to mellanox platform (#11342)
- Why I did it
Add support for mellanox platform building for target architecture arm64.

- How I did it
Contains the following changes:
1. Change instances of hard-coded amd64 to $(CONFIGURED_ARCH)
2. Add logic to download correct binary for MFT package
3. Add TARGET_BOOTLOADER=grub definition to rules.mk to override default arm64 bootloader

- How to verify it
Build mellanox platform with TARGET_ARCH set as arm64
2022-07-13 16:21:33 +03:00
Kebo Liu
85539e7e08
[Mellanox] Update hw-mgmt package to version V.7.0020.2004 (#10401)
- Why I did it
Take new hw-mgmt release to SONiC, including:

New features:
1. hw-mgmt: add to PSU FW upgrade tool command to show current FW version
2. hw-mgmt: add to PSU FW upgrade tool support for single-PSU-in-the-system FW upgrade
3. hw-mgmt: add attribute “/firmware” to show FW version of restricted upgradable PSUs only
4. hw-mgmt: Add NVME temperature reports attributes (_alarm/_crit/_min/_max)

Bug fix:
1. psu: redundant i2c_addr attributes being created for psu 3 & 4 in system having only 2 psus.
2. hw-mgmt: in SPC1/2 i2c driver removal is too slow vs. ASIC reset causing non-functional log errors
3. PSU thresholds sysfs changed in 5.10 to “read only” preventing modification (modification required due PSU HW bug)
4. CPLD3 sysfs attribute missing after chip down/up flow
5. sysfs attributes missing when hw-mgmt is restarted (stop/start) within systemd

Release notes can be found from link https://github.com/Mellanox/hw-mgmt/blob/V.7.0020.2004/debian/Release.txt

- How I did it
Update hw-mgmt make file with new version number
Update hw-mgmt submodule pointer

- How to verify it
Run platform regression on all Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-03-31 08:21:09 +03:00
Alexander Allen
7c4fbf0455
[Mellanox] Add patch to hw-mgmt to prevent loading of non-existent kernel modules (#10073)
- Why I did it
The latest upgrade of Mellanox hw-mgmt V7.0020.1300 introduced a couple new kernel modules for new Mellanox platforms that have yet to be upstreamed to the linux kernel.

As these new platforms do not have SONiC support we elected not to upstream these new drivers to sonic-linux-kernel but hw-mgmt expects them to exist which is causing a non-functional error on switch boot.

Feb 15 00:09:55.374130 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'emc2305'
Feb 15 00:09:55.374141 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'ads1015'
To resolve this we can patch hw-mgmt to no longer attempt to load these modules by default.

- How I did it
Added a SONiC patch to Mellanox hw-mgmt in order to remove the unused kernel modules which were not upstreamed to sonic-linux-kernel

- How to verify it
Boot switch and verify there are no error logs regarding kernel modules failing to load.
2022-02-28 08:08:19 +02:00
Alexander Allen
0ae2906c06
[Mellanox] Update mellanox hw-mgmt submodule and versions to V.7.0020.1300 (#9860)
- Why I did it
New version of mellanox platform management code available adding support for new platforms and fixing bugs.

- How I did it
1. Updated the submodule
2. Updated makefile version references
3. Regenerated SONiC patches
2022-02-06 16:42:53 +02:00
Stepan Blyshchak
f3df6e2f1b
[Mellanox] fix hw-mgmt patches (#9539)
- Why I did it
To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply.

- How I did it
Removed obsolete patch, made applying patches a hard failure in the build.

- How to verify it
Run the make and verify patches are applied.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-12-16 16:02:03 +02:00
Alexander Allen
467ae5beca
[mellanox] update hw-mgmt pointer (#9358)
#### Why I did it

Updated hw-mgmt pointer to updated branch and to include new bugfixes. The hw-mgmt submodule was previously pointing to an orphaned commit which could not be fetched from github, this has now been resolved. 

#### How I did it

Updated submodule pointer.

#### How to verify it

Clone down repository and update all submodules.
2021-12-01 14:02:38 -08:00
Alexander Allen
cc5a2f3d54 Update pointer (#12)
Updated the hw-mgmt pointer to include some bugfixes related to power supply voltages.
2021-11-10 15:27:22 -08:00
Alexander Allen
2847265bfd Mellanox bullseye merge (#1)
Allow mellanox platform to build and successfully switch packets in
Debian 11

Upgraded

* Mellanox SDK
* Mellanox Hardware Management
* Mellanox Firmware
* Mellanox Kernel Patches

Adjusted build system to support host system running bullseye and
dockers running buster.
2021-11-10 15:27:22 -08:00
Nazarii Hnydyn
63ba489c6b
[Mellanox] Advance hw-mgmt to V.7.0010.2346. (#8667)
Commits on Sep 01, 2021
hw-mgmt: attributes: Add PSU power sensor attributes d8fce39

Commits on Sep 02, 2021
Remove MFT package flint tool from hw-management dump generation. 53d06b2
hw-mgmt: debug: Add timeout to generate-dump.sh b661fa3 

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-09-08 09:59:50 -07:00
Kebo Liu
59c13cb406
[Mellanox] Upgrade hw-mgmt to 7.0100.2344 (#8463)
To pick up new PSU fan support from new hw-mgmt release

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-08-13 03:25:53 -07:00
Kebo Liu
bf21dbce87
[Mellanox] Add support for MSN4600 A1 system (#7732)
Add new sensor conf for MSN4600 A1 system
Add a Mellanox hw-management patch to support MSN4600 A1 system
2021-05-27 09:52:45 -07:00
Kebo Liu
237849a330
update hw-mgmt version to 2304 (#7725)
- Why I did it
Pick up fix from new hw-management package:
Fix gearbox thermal zone name, which was lack suffix thermal zone number

- How I did it
Update the hw-management version number in the make file
Update hw-management submodule pointer

- How to verify it
Run platform related test cases on Mellanox platform
2021-05-27 14:17:23 +03:00
Junchao-Mellanox
ccc7bd1315
[Mellanox] Upgrade hw-mgmt to 7.0100.2303 (#7419)
- Why I did it
Upgrade hw-mgmt to 7.0100.2303

Bug fixes

1. Fan direction feature fix for fixed FAN system (using shell instead of binutils/strings)
2. Remove cpld 4th link on systems with only 3 CPLD's
3. hw-mgmt: thermal: Add hardcoded critical trip point. Follow-up after patch "Removing critical thermal zones to prevent unexpected software system shutdown".
4. Fix sensor attribute mapping to be label based instead of index based to allow common handling of voltage regulator names independently of hardware changes.
5. Update 'lm-sensors' custom configuration file. Relevant only for users utilizing sensors.conf files coming along with hw-management package.
6. For full feature list please follow https://github.com/Mellanox/hw-mgmt/blob/V.7.0010.2300_BR/debian/Release.txt

- How I did it
Update hw-mgmt pointer
Remove unused patches
Fix existing patch to make sure it apply successfully

- How to verify it
Full platform regression on all mellanox platforms
2021-04-28 16:21:55 +03:00
Stephen Sun
ecaf97d8a3
[mellanox]: Integrate hw-mgmt package V.7.0010.2002 (#7148)
Integrate hw-management package V.7.0010.2002

Bug fixes:
Removing critical thermal zones to prevent unexpected software system shutdown:
*Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
*Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
Removing redundant link for cpld3 for fixed systems (SN2100, SN2010).
Fix an issue with missed attribute for cpld3 (port CPLD) for SN2700, SN2410.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-03-30 18:30:15 -07:00
Kebo Liu
0e71d82f72
[Mellanox] Update hw-management package to version 7.0010.2000 (#6692)
- Why I did it
   Bug fixes
   - In rare cases when thermal algorithm is reactivated after FAN/PSU insertion, FAN remains at high rpm
   - When stop hw-management code received error in the log instead of exit code '0'.
   - In SPC1 i2c sometimes collide with chip reset coming from SDK
   - Remove raw eeprom data link, when working with PSU which don't have eeprom for "msn274x", "msn24xx" and "msn27xx" systems
   - Fix memory leak on mlxsw_core_bus_device module removal

- How I did it
Update the hw-mgmt version number in the make file
Update the hw-mgmt repo pointer

- How to verify it
run platform related test cases on all Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-03-01 10:01:50 +02:00
shlomibitton
3de6a67353
[Mellanox] Add hw-mgmt patch for SimX platform adaptation (#6782)
- Why I did it
System is stuck on 'starting' state on SimX platform because of infinite loop on 'hw-management-ready.sh' script .
The loop is polling to check if the hw-mgmt sysfs created before proceeding with the flow, for SimX platform the sysfs will never create so the system is not starting properly.

- How I did it
Add a condition to poll on hw-mgmt sysfs only if the switch is real HW and not SimX platform.

- How to verify it
Check "systemctl status hw-management.service" output on a SimX switch with this patch, the state will be "active".

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-02-25 12:41:29 +02:00
Kebo Liu
9ff56445c9
Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550)
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
2021-01-27 15:39:54 +02:00
lguohan
755c73797c
[mellanox]: fix mellanox hw-management build (#6471)
use dpkg-buildpackage build with fakeroot

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-18 13:10:27 -08:00
Kebo Liu
4cf9316ec3
[Mellanox] Make determine-reboot-cause service start after hw-management service (#6465)
**- Why I did it**

On the Mellanox platform, reboot cause is fetched from some certain sysfs which is created by the hw-management service. So determine-reboot-cause service shall start after hw-management, otherwise it could fail due to the related sysfs is not available yet.

**- How I did it**

Add a patch to the hw-management service to make sure determine-reboot-cause service should start after it.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 11:38:31 -08:00
Kebo Liu
16774580f8
[Mellanox] update hw-mgmt package to V.7.0010.1300 (#5902)
pick up new functions and bug fixes:

- New Features
    -   Add dynamic minimum tables for MSN3700X, MSN3800, MSN3420, MSN4600, MSN4700 systems
    -   Split hw-management to one-shot init hw-management service and thermal control services.
    
- Bug fixes
    HW Mgmt core:
    -   Move PSU EEPROM configuration from kernel to user space for Spectrum 2 / Spectrum 3 system
2020-11-16 01:57:19 -08:00
Qi Luo
f494ff1890
Revert "[Mellanox] update hw-mgmt package to V.7.0010.1300 (#5363)" (#5371)
This reverts commit 940de61ffc.
2020-09-16 10:38:21 -07:00
Kebo Liu
940de61ffc
[Mellanox] update hw-mgmt package to V.7.0010.1300 (#5363) 2020-09-14 21:24:29 +03:00
Kebo Liu
91a1f131a1
[Mellanox] Update hw-mgmt package to V.7.0010.1000 for master (#4687)
* [Mellanox] Update hw-mgmt package to V.7.0010.1000

* update sonic-linux-kernel pointer to pick up new patch
2020-06-16 21:01:41 +03:00
Junchao-Mellanox
1cdcb2c62d
[Mellanox] Add patch to disable hw-management thermal control shell script (#4550)
* [Mellanox] Add patch to disable hw-management thermal control shell script

* Remove SimX patch since https://github.com/Azure/sonic-buildimage/pull/4364/files has already handle it
2020-05-07 12:35:48 -07:00
shlomibitton
30bbbbf24f
hw-mgmt_V.7.0000.3034 integration (#4519)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-05-02 20:37:14 +03:00
Kebo Liu
89fb1059fa
[Mellanox] Update hw-mgmt package to V.7.0000.3020 (#4362)
* update hw-mgmt package to V.7.0000.3020
* update sonic-linux-kernel repo to pick up new patches
2020-04-15 03:04:11 -07:00
Junchao-Mellanox
be549db395
Add thermal control support for SONiC (#3949) 2020-03-09 10:41:10 -07:00
lguohan
b08bedbfe8
[Mellanox]Integrate hw-mgmt 7.0000.3012 and advance the linux kernel (#4193)
* [Mellanox]Integrate hw-mgmt 7.0000.3012

* [sonic-linux-kernel]Advance the submodule head

Advance the sonic-linux-kernel

[sFlow]: Patch to fix skb_over_panic in psample driver (#120)
Added support in the kernel for fullcone 3-tuple unique nat. (#100)
Adding support to compile ARM architecture (#102)
[ixgbe] Support bcm54616s external phy in ixgbe (#122)
Fix i2c ISMT DMA buffer alignment issue (#123)
[mellanox]: Add SN4700 patches. (#126)
2020-03-04 10:02:55 -08:00
Mykola F
70657cb182
[Mellanox] update hw-mgmt patch for SimX (#4180)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2020-02-25 10:25:31 +02:00
Stephen Sun
04b9113410 [Mellanox]Update the hw-mgmt patch for simx on V.7.0000.2308 (#3957)
* [Mellanox/hw-mgmt] Update the hw-mgmt patch for simx on V.7.0000.2308

* removing the extra "[PATCH]"
2020-01-07 10:42:00 +02:00
Stephen Sun
ef26ba024d [Mellanox]Update hw-mgmt to V7.0000.2308 (#3858)
* [Mellanox]Update hw-mgmt to V7.0000.2308
sonic-linux-kernel should be updated accordingly with necessary patches uploaded.

* [sub-module]Advance submodule head for sonic-linux-kernel
2019-12-12 11:09:28 -08:00
Nazarii Hnydyn
7c5fb775d9 [mellanox] Upgrade HW-MGMT to V.7.0000.2303 (#3707)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-11-06 00:27:18 +02:00
Nazarii Hnydyn
0a39ee4171 [mellanox] Update HW-MGMT: V.7.0000.2300. (#3617)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-10-16 13:44:18 -07:00
Andriy Moroz
7ca4d32daf [Mellanox] Update SDK (v3.4.1886) and hw-magmt (v2.0.0191) (#3359)
This commit also includes sonic-linux-kernel submodule update
Commits included:
f8b30b4 [Mellanox] Add hw-management driver patches (#97)
feb786b Add psample and act_sample drivers (#94)
15f8651 Update optoe driver to add CMIS (QSFP-DD, OSFP, ...) support (#96)
2019-08-19 14:09:17 +03:00
Kebo Liu
89d98640f5 [Mellanox]Update hw-mgmt package to v183 (#3138)
* Update hw-mgmt package to v183

* update sonic-linux-kernel repo to pick up new patches
2019-07-12 13:09:36 +03:00
Kebo Liu
60bd7417ea [Mellanox] Update hw-mgmt package to v175 (#2948)
* update hw-mgmt package to v175

* update sonic linux kernel to pick up kernel patch
2019-05-28 08:30:57 -07:00
Kebo Liu
e797bf8516 [mellanox]: Update hw-mgmt pakcage to V.2.0.0.0172 (#2798) 2019-04-18 02:25:10 -07:00
Mykola F
1aa258d3cb [fw-upgrade] fix issue with fw-upgrade (#2785)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-04-16 10:06:10 -07:00
Mykola F
d993d6f3ac [Mellanox] build one image for Mellanox & Mellanox SimX (#2664)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-04-10 21:55:14 -07:00
Nazarii Hnydyn
b22fe37670 [mellanox]: Upgraded hw-management V.2.0.0160. (#2643)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-03-06 18:51:46 -08:00
Mykola F
8300408b47 [submodule] update mellanox hw-mgmgt pointer (V.2.0.0061) (#2592)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-02-22 16:54:55 +02:00
Mykola F
da3c0814b6 [submodule] Update the Mellanox hw-mgmt pointer (back to 344e819) (#2588)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-02-19 18:03:26 +02:00
Nazarii Hnydyn
d53df059d4 [devices]: Added new SN3700/SN3700C Mellanox platforms (#2548)
* [mlnx-msn3700]: Added MSN3700 platform.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Upgrade FW burn: use ASIC auto detect.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Updated HW-MGMT/FW/MFT/SAI/SDK.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [mlnx-msn3700]: Added MSN3700C platform.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-02-13 23:08:04 -08:00
Kevin(Shengkai) Wang
ea4b4bd650 [mellanox]: Update recipe for hw-mgmt according to latest changes (#2128)
Update the hw-mgmt to latest release V.2.0.0060.
Update the related files according to the latest hw-mgmt.

Signed-off-by: Kevin Wang <kevinw@mellanox.com>
2018-10-08 18:33:44 -07:00
Volodymyr Samotiy
4aa3f7af68 [mellanox]: Fix system EEPROM for MSN2740 platform (#1950)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
2018-08-20 09:06:20 -07:00