sonic-buildimage/platform/mellanox
Junchao-Mellanox 7543993af3
[202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type (#13543)
- Why I did it
There are 3 tasks in xcvrd:

main task, run a loop to recover missing SFP static information to DB every 1 minute
SFP state task, a process which listens cable plug in/out event, insert SFP static information to DB while a cable is inserted
SFP DOM update task, a thread which handles cable DOM information update every 1 minute
Let assume user replaces QSFP with QSFP-DD. There are two issues:

Only SFP state task listens cable plug in/out event, main task and SFP DOM update task does not know SFP type has changed, they still “think” the SFP type is QSFP. So, main task and SFP DOM update task uses QSFP standard to parse QSFP-DD EEPROM which causes corrupted data.
There is a race condition between main task and SFP state task. They both insert SFP static information to DB. Depends on timing, it is possible that main task using wrong SFP type to override SFP static information.
The PR is to fix these two issues.

There is no such issue on 202205 and above because there is a refactor for xcvrd:

SFP state task was changed from process to thread, so that all 3 tasks share the same memory space, they always have correct SFP type.
Recover missing SFP information logical was moved from main task to SFP state task. There is no race condition anymore.

- How I did it
It is difficult to back port latest xcvrd because there are many refactor/new features in xcvrd after 202012 release. It will be huge effort to do so. Based on that, we decided to fix the issue on Nvidia platform API side. The fix is that: refreshing SFP type before any SFP API which accessing SFP EEPROM. Refreshing SFP type before any SFP API would cause a small performance down: Due to my test on 202012 branch, accessing transceiver INFO and DOM INFO for 32 ports takes 1.7 seconds before the change. The number changes to 2.4 seconds after the change. I suppose the performance down is acceptable.

- How to verify it
Manual test
Regression
2023-02-19 09:47:32 +02:00
..
bios [202012][mellanox]: Add BIOS upgrade infra (#13571) 2023-02-02 10:07:03 +02:00
docker-saiserver-mlnx [supervisord]: use abspath as supervisord entrypoint (#5995) 2020-11-22 21:18:44 -08:00
docker-syncd-mlnx [202012][Monit] Deprecate the feature of monitoring the critical processes by Monit (#7823) 2021-06-09 09:04:22 -07:00
docker-syncd-mlnx-rpc [syncd-rpc] Install Libboost Atomic 1.71, Libqtcore And Libqtnetwork (#6689) 2021-02-16 15:29:21 -08:00
hw-management [202012] [Mellanox] Update hw-mgmt package to V.7.0010.2349 (#11421) 2022-07-20 09:00:17 +03:00
issu-version [mellanox|ffb] ISSU version check (#2437) 2019-01-17 14:41:32 -08:00
mft [Mellanox] Add MFT DKMS build support. (#5088) 2020-08-03 13:52:40 +03:00
mlnx-platform-api [202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type (#13543) 2023-02-19 09:47:32 +02:00
mlnx-sai [202012][Mellanox] Update SAI version to 1.22.0.0 and SDK/FW to version 4.5.2318/2010_2318 (#11534) 2022-07-26 21:01:36 +03:00
sdk-src [202012][Mellanox] Update SDK/FW to version 4.5.3196/2010_3196 (#12989) 2022-12-08 12:16:54 +02:00
.gitignore [202012][mellanox]: Add BIOS upgrade infra (#13571) 2023-02-02 10:07:03 +02:00
asic_table.j2 Bug fix: Support dynamic buffer calculation on ACS-MSN3420 and ACS-MSN4410 (#7113) 2021-04-08 18:36:27 +00:00
bios.mk [202012][mellanox]: Add BIOS upgrade infra (#13571) 2023-02-02 10:07:03 +02:00
docker-saiserver-mlnx.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
docker-saiserver-mlnx.mk [build]: add docker-saiserver-* as stretch docker targets 2020-05-06 10:23:38 +00:00
docker-syncd-mlnx-rpc.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
docker-syncd-mlnx-rpc.mk [dockers] update mellanox syncd and pmon to buster (#4818) 2020-07-18 03:46:15 -07:00
docker-syncd-mlnx.dep [build]: Fix syncd dpkg cache dependency issue (#6680) 2021-02-05 15:47:28 -08:00
docker-syncd-mlnx.mk [Mellanox] Install MFT packages on Syncd container (#7844) 2021-06-17 07:09:50 +00:00
fw.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
fw.mk [202012][Mellanox] Update SDK/FW to version 4.5.3196/2010_3196 (#12989) 2022-12-08 12:16:54 +02:00
hw-management.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
hw-management.mk [202012] [Mellanox] Update hw-mgmt package to V.7.0010.2349 (#11421) 2022-07-20 09:00:17 +03:00
issu-version.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
issu-version.mk [mellanox]: Add SSD FW update tool (#4351) 2020-04-13 18:13:19 +03:00
libsaithrift-dev.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
libsaithrift-dev.mk [sai and sairedis] advance sairedis sub-module and upgrade to matching Broadcom SAI build (#2488) 2019-02-16 10:14:18 -08:00
mft.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mft.mk [202012][Mellanox] Change MFT version to 4.18.0-106 (#10305) 2022-03-21 19:37:34 +02:00
mlnx-ffb.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ffb.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ffb.sh Update all references to new 'sonic-installer' name (#5119) 2020-08-07 08:49:39 -07:00
mlnx-fw-upgrade.j2 [Mellanox] Add hw-mgmt patch for SimX platform adaptation (#6782) 2021-03-04 21:23:05 +00:00
mlnx-onie-fw-update.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-onie-fw-update.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-onie-fw-update.sh [202012][Mellanox]: Update ONiE FW tool: manual reboot control. (#13359) 2023-01-16 15:27:48 +02:00
mlnx-platform-api.dep [Mellanox] Add python3 support for Mellanox platform API (#6175) 2020-12-11 10:51:31 -08:00
mlnx-platform-api.mk [Mellanox] Add python3 support for Mellanox platform API (#6175) 2020-12-11 10:51:31 -08:00
mlnx-sai.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-sai.mk [202012][Mellanox] Update SAI version to 1.22.0.0 and SDK/FW to version 4.5.2318/2010_2318 (#11534) 2022-07-26 21:01:36 +03:00
mlnx-ssd-fw-update.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ssd-fw-update.mk [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
mlnx-ssd-fw-update.sh [Mellanox] Add ONIE and SSD platform components. (#4758) 2020-06-15 14:25:49 +03:00
one-image.dep [mellanox]: Add DPKG local caching support. (#4441) 2020-04-20 19:02:46 -07:00
one-image.mk [mellanox]: Add SSD FW update tool (#4351) 2020-04-13 18:13:19 +03:00
peripheral_table.j2 [Dynamic buffer calc] Support dynamic buffer calculation (#6194) 2020-12-13 11:35:39 -08:00
platform.conf one image implementation (#215) 2017-01-29 11:33:33 -08:00
rules.dep [docker-ptf]: build docker ptf 2021-01-28 09:23:12 -08:00
rules.mk [202012][mellanox]: Add BIOS upgrade infra (#13571) 2023-02-02 10:07:03 +02:00
sdk.dep [nvidia/mellanox] add MLNX_SDK_DEB_VERSION to SDK packages flags list. (#7747) 2021-06-09 08:28:13 +00:00
sdk.mk [202012][Mellanox] Update SDK/FW to version 4.5.3196/2010_3196 (#12989) 2022-12-08 12:16:54 +02:00
zero_profiles.j2 [Reclaim buffer][202012] Reclaim unused buffers by applying zero buffer profiles (#9063) 2021-12-09 17:34:56 +02:00