- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305
- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service
- How to verify it
Regression test.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Take new hw-mgmt release to SONiC, including:
New features:
1. hw-mgmt: add to PSU FW upgrade tool command to show current FW version
2. hw-mgmt: add to PSU FW upgrade tool support for single-PSU-in-the-system FW upgrade
3. hw-mgmt: add attribute “/firmware” to show FW version of restricted upgradable PSUs only
4. hw-mgmt: Add NVME temperature reports attributes (_alarm/_crit/_min/_max)
Bug fix:
1. psu: redundant i2c_addr attributes being created for psu 3 & 4 in system having only 2 psus.
2. hw-mgmt: in SPC1/2 i2c driver removal is too slow vs. ASIC reset causing non-functional log errors
3. PSU thresholds sysfs changed in 5.10 to “read only” preventing modification (modification required due PSU HW bug)
4. CPLD3 sysfs attribute missing after chip down/up flow
5. sysfs attributes missing when hw-mgmt is restarted (stop/start) within systemd
Release notes can be found from link https://github.com/Mellanox/hw-mgmt/blob/V.7.0020.2004/debian/Release.txt
- How I did it
Update hw-mgmt make file with new version number
Update hw-mgmt submodule pointer
- How to verify it
Run platform regression on all Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
The latest upgrade of Mellanox hw-mgmt V7.0020.1300 introduced a couple new kernel modules for new Mellanox platforms that have yet to be upstreamed to the linux kernel.
As these new platforms do not have SONiC support we elected not to upstream these new drivers to sonic-linux-kernel but hw-mgmt expects them to exist which is causing a non-functional error on switch boot.
Feb 15 00:09:55.374130 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'emc2305'
Feb 15 00:09:55.374141 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'ads1015'
To resolve this we can patch hw-mgmt to no longer attempt to load these modules by default.
- How I did it
Added a SONiC patch to Mellanox hw-mgmt in order to remove the unused kernel modules which were not upstreamed to sonic-linux-kernel
- How to verify it
Boot switch and verify there are no error logs regarding kernel modules failing to load.