sonic-buildimage

Author	SHA1	Message	Date
Stepan Blyshchak	bc58e2d841	[202012][mlnx-ffb.sh] Update issu-version location (#14927 ) BACKPORT OF https://github.com/sonic-net/sonic-buildimage/pull/14925 #### Why I did it ISSU version check fails due to inability to mount squashfs from 202211 on 201911 #### How I did it Put ISSU version file under platform directory #### How to verify it 202012 (with [202012][mlnx-ffb.sh] Update issu-version location #14927) to master	2023-07-01 23:43:51 -07:00
Yakiv Huryk	ab5115846d	[202012][Mellanox] update sdk/fw build procedure (#14025 ) (#14220 ) - Why I did it To optimize Mellanox platform build - How I did it sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release sx kernel is downloaded as zip from Spectrum-SDK-Drivers	2023-03-16 12:42:19 +02:00
Sudharsan Dhamal Gopalarathnam	79548e472d	[Mellanox]Fix lpmode set when logical port is larger than 64 (#14138 ) (#14202 ) Manual cherry-pick of https://github.com/sonic-net/sonic-buildimage/pull/14138 - Why I did it In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1 - How I did it Remove the hardcoded value of 64. Obtained the number of logical ports from SDK - How to verify it Manual testing	2023-03-14 10:19:02 -07:00
Sudharsan Dhamal Gopalarathnam	ca17198f04	[202012][Mellanox] Change MFT version to 4.21.0-100 (#13956 ) - Why I did it Update MFT version to 4.21.0-100 to include a fix for an issue reported using mlxlink on qsfp-dd - How I did it Update mft.mk - How to verify it Run regression on Mellanox platforms	2023-02-26 09:42:52 +02:00
Junchao-Mellanox	0f47c5be59	[202012] [Mellanox] Fix issue: cannot lable port for logical port is logical port number larger than 64 (#13709 ) - Why I did it sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message. - How I did it Don't use hardcoded 64, get logical port number instead. - How to verify it Manual test	2023-02-23 08:27:21 +02:00
Stepan Blyshchak	73c7ced753	[202012][Mellanox] Place FW binaries under platform directory instead of squashfs (#13890 ) Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation: admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa - Why I did it 202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change. - How I did it Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation. /etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade. - How to verify it Upgrade from 201911 to 202012 202012 to 201911 downgrade 202012 -> 202012 reboot ONIE -> 202012 boot (First FW burn) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-02-22 17:38:54 +02:00
Junchao-Mellanox	7543993af3	[202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type (#13543 ) - Why I did it There are 3 tasks in xcvrd: main task, run a loop to recover missing SFP static information to DB every 1 minute SFP state task, a process which listens cable plug in/out event, insert SFP static information to DB while a cable is inserted SFP DOM update task, a thread which handles cable DOM information update every 1 minute Let assume user replaces QSFP with QSFP-DD. There are two issues: Only SFP state task listens cable plug in/out event, main task and SFP DOM update task does not know SFP type has changed, they still “think” the SFP type is QSFP. So, main task and SFP DOM update task uses QSFP standard to parse QSFP-DD EEPROM which causes corrupted data. There is a race condition between main task and SFP state task. They both insert SFP static information to DB. Depends on timing, it is possible that main task using wrong SFP type to override SFP static information. The PR is to fix these two issues. There is no such issue on 202205 and above because there is a refactor for xcvrd: SFP state task was changed from process to thread, so that all 3 tasks share the same memory space, they always have correct SFP type. Recover missing SFP information logical was moved from main task to SFP state task. There is no race condition anymore. - How I did it It is difficult to back port latest xcvrd because there are many refactor/new features in xcvrd after 202012 release. It will be huge effort to do so. Based on that, we decided to fix the issue on Nvidia platform API side. The fix is that: refreshing SFP type before any SFP API which accessing SFP EEPROM. Refreshing SFP type before any SFP API would cause a small performance down: Due to my test on 202012 branch, accessing transceiver INFO and DOM INFO for 32 ports takes 1.7 seconds before the change. The number changes to 2.4 seconds after the change. I suppose the performance down is acceptable. - How to verify it Manual test Regression	2023-02-19 09:47:32 +02:00
Nazarii Hnydyn	83b6518ae2	[202012][mellanox]: Add BIOS upgrade infra (#13571 ) - Why I did it Added BIOS upgrade infra - How I did it Added new make target - How to verify it Copy msn3800_bios.tar.gz to platform/mellanox/bios make configure PLATFORM=mellanox make target/files/buster/msn3800_bios.tar.gz Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2023-02-02 10:07:03 +02:00
Junchao-Mellanox	46a774e294	[202012] [Mellanox] Fix select timeout in sfp event (#13347 ) - Why I did it Backport #9795 Python select.select accept a optional timeout value in seconds, however, the value passes to it is a value in millisecond. - How I did it Transfer the value to millisecond. - How to verify it Manual test	2023-01-19 17:29:41 +02:00
Kebo Liu	a569bfc9eb	skip hw reboot cause if warm/fast reboot found from the proc cmdline (#13378 ) #### Why I did it Backport https://github.com/sonic-net/sonic-buildimage/pull/13246 to 202012 branch. In case of warm/fast reboot, the hardware reboot cause will NOT be cleared because CPLD will not be touched in this flow. To not confuse the reboot cause determine logic, the leftover hardware reboot cause shall be skipped by the platform API, platform API will return the 'REBOOT_CAUSE_NON_HARDWARE' instead of the "hardware" reboot cause. #### How I did it Check the proc cmdline to see whether the last reboot is a warm or fast reboot, if yes skip checking the leftover hardware reboot cause. #### How to verify it a. Manual test: > 1. Perform a power loss > 2. Perform a warm/fast reboot > 3. check the reboot cause should be "warm-reboot" or "fast-reboot" instead of "power loss" b. Run reboot cause related regression test.	2023-01-17 13:21:31 -08:00
Nazarii Hnydyn	5193a96895	[202012][Mellanox]: Update ONiE FW tool: manual reboot control. (#13359 ) Partial cherry-pick of: [Mellanox] Modified Platform API to support all firmware updates in single boot #9608 - Why I did it To allow user manual reboot control over ONiE FW upgrade - How I did it Added a dedicated script argument handling - How to verify it mlnx-onie-fw-update.sh update --no-reboot Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2023-01-16 15:27:48 +02:00
Dror Prital	e8c7a7c61e	[202012][Mellanox] Update SDK/FW to version 4.5.3196/2010_3196 (#12989 ) - Why I did it Update SDK/FW version - 4.5.3196/2010_3196 in order to have the following fixes: 1. ON SPC2/3 in some cases, after many ACL region resize will corrupt internal DB that in return will fail future ACLs configuration 2.. Lag Port as Analyzer Port \| when removing port from distributer list SDK does not reselect another port for mirroring 3. Due to critical race at initial configuration, SDK RDQ test may test RDQ configured for WJH and fail the test Add support for new HW SKU of SN4700 - How I did it Update pointer for the SDK/FW - How to verify it Run regression tests	2022-12-08 12:16:54 +02:00
Kebo Liu	db03698ba5	fix DOM support caoability issues on QSFP and CMIS cables (#12500 ) Signed-off-by: Kebo Liu <kebol@nvidia.com> Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-10-30 23:20:57 -07:00
Kebo Liu	78043b828c	[202012] [Mellanox] Read transceiver EEPROM via sdk sysfs (#12399 ) - Why I did it ethtool is not able to read certain pages(eg. page 11h) of CMIS cables. SDK provides a set of sysfs to expose the transceiver EEPROM, now we migrate from using ethtool to read these sysfs for transceiver EEPROM reading. - How I did it replace ethtool with accessing the SDK sysfs for cable EEPROM reading. Adjust the offset according to the SDK sysfs memory map. - How to verify it run sonic-mgmt sfp-related regression test case. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-10-30 09:34:39 +02:00
Dror Prital	5de7ae449a	Update SDK/FW to version 4.5.3186/2010_3186 (#12531 ) - Why I did it Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes: New functionality: 1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system Fix the following issues: 1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow 2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed 3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received 4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted 5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU 6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized 7. SLL configuration is missing in SDK dump 8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero) 9. PCI calibration changes from a static to a dynamic mechanism 10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event 11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None - How I did it Update pointer for the SDK/FW - How to verify it Run regression tests Signed-off-by: dprital <drorp@nvidia.com>	2022-10-30 09:29:45 +02:00
Dror Prital	edc4485d30	[202012][Mellanox] Update SDK/FW to version 4.5.2320/2010_2320 (#11975 ) Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes: • Spectrum-3 \| PCI calibration changes from a static to a dynamic mechanism. • [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)	2022-09-07 08:33:18 +03:00
Dror Prital	db37325f76	[202012][Mellanox] Update SAI version to 1.22.0.0 and SDK/FW to version 4.5.2318/2010_2318 (#11534 ) - Why I did it Update SAI version - 1.22.0.0 Update SDK/FW version - 4.5.2318/2010_2318 SAI Changes: 1. Port FEC fix for multiple speeds 2. Next hop group optimized bulk API 3. Support BFD remote-disc exchange in negotiation stage 4. Reduce verbosity of shared database already exists print SDK/FW Fixes: 1. Cr space timeout on Hold and Release GW - at warmboot 2. SPC-1 Port in stuck PHY_UP after peer side rebooted 3. memory leak in sx_api_router_ecmp_update_set - How I did it Update pointer for the new SAI and SDK/FW - How to verify it Run regression tests	2022-07-26 21:01:36 +03:00
Kebo Liu	c60bf90590	[202012] [Mellanox] Update hw-mgmt package to V.7.0010.2349 (#11421 ) - Why I did it New changes in this new HW-MGMT package: 1. hw-mgmt: chassis events: Fix voltmon address conflict on connecting 2. hw-mgmt: topology: Add COMEX BRDWL respin support a. Removed A2D sensor from all COMEX BRDWL boards b. Add COMEX BRDWL boards with register defined (config3) - How I did it Advance the hw-mgmt repo pointer and update the hw-mgmt version number - How to verify it Run platform-related regression test cases on the new testbed. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-07-20 09:00:17 +03:00
Alexander Allen	851bd9bff8	[Mellanox] Add arch folder to SDK binary location (#11278 ) - Why I did it This is for the eventual support of multiple architectures for the mellanox platform. - How I did it Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for. This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image. - How to verify it Build SONiC for mellanox and verify it compiles successfully.	2022-07-05 20:58:01 +00:00
Nazarii Hnydyn	05ff95fdfc	[Mellanox]: Advance SAI submodule. (#11164 ) [Mellanox]: Advance SAI submodule. (#11164) Fix #3074227 - don't disable used tunnel underlay interfaces fix bfd - notify Sonic for admin-down event	2022-06-16 18:09:59 -07:00
Volodymyr Samotiy	6b029a613b	[202012] [Mellanox] Update SAI to 1.21.1.2 and SDK/FW to 4.5.2262/xx.2010.2262 (#10880 ) - Why I did it To include latest fixes: 1. Warmboot \| When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU. 2. Link Up \| When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted. 3. Shared buffer \| While moving from lossless to lossy while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized. - How I did it Updated SAI & SDK submodules along with the relevant Makefiles - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-05-22 09:48:37 +03:00
Sudharsan Dhamal Gopalarathnam	2a232730b0	[202012][Mellanox] Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.1.2 (#10464 ) * [Mellanox] Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.0.1 Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com> * Updating Switch-SDK-drivers submodule pointer * Updating SAI version	2022-05-04 06:07:10 +03:00
Kebo Liu	68b38b325b	[202012][Mellanox] Change MFT version to 4.18.0-106 (#10305 ) - Why I did it With the previous MFT 4.18.1-16 there is a bug in mstdump tool accessing wrong address. it is confirmed this issue does not exist in official 4.18.0-106. - How I did it Update the MFT version to 4.18.0-106 - How to verify it Run regression on Mellanox platforms	2022-03-21 19:37:34 +02:00
Junchao-Mellanox	0c859fb036	[Mellanox] [202012] Fix issue: 4600C is using wrong thermal profile (#10258 ) - Why I did it 4600C is using wrong thermal profile and it displays 2 CPU core thermal in show platform temperature output, there should be 4 CPU core thermal. - How I did it Change 4600C to use thermal profile 10. - How to verify it Manual test	2022-03-20 10:31:59 +02:00
Dror Prital	6293a091a8	[Mellanox] Upgrade ASIC FW tool to 4.18.1-16 (#9981 ) - Why I did it Update MFT to version 4.18.1-16 for bugs fixes and new SN2201 support - How I did it Advance to MFT tool version to 4.18.1-16 - How to verify it Manually tested on all Mellanox platforms (ASIC FW Upgrade, link debug tools, CPLD upgrade, etc.)	2022-02-15 23:56:58 +00:00
Volodymyr Samotiy	e6b22b1942	[Mellanox][202012] Update SAI to 1.20.2.6 and SDK/FW to 4.5.1208/2010.1218 (#9818 ) - Why I did it To include latest fixes. 1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays. 2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds. 3. Add T1 ECMP Overlay support - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-01-26 10:58:19 +02:00
Junchao-Mellanox	8e924b9a70	[Mellanox] Optimize thermal policies (#9665 ) - Why I did it Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work. - How I did it Reduce unused thermal policies Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies Move some logic from fan.py to thermal.py to make it more readable - How to verify it 1. Manual test 2. Regression	2022-01-19 11:42:55 +02:00
Stepan Blyshchak	31065ccb93	[Mellanox] [202012] fail the build when hw-mgmt patches do not apply (#9566 ) Taken from https://github.com/Azure/sonic-buildimage/pull/9539 #### Why I did it To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply. #### How I did it Removed obsolete patch, made applying patches a hard failure in the build. #### How to verify it Run the make and verify patches are applied.	2022-01-13 15:08:27 -08:00
DavidZagury	57abd5914e	[Mellanox] Upgrade Mellanox firmware tools to 4.17.2-12 (#8978 ) - Why I did it Bug fix: bad_param request due to missing parser rest command while running mlxlink - How I did it Advance to MFT tool version to 4.17.2-12. - How to verify it Manually tested on all mellanox platforms.	2022-01-12 22:36:11 +00:00
Kebo Liu	16a3929159	[202012][Mellanox] Update hw-mgmt package to V.7.0010.2347 (#9594 ) - Why I did it Update hw-mgmt to a new version to pick up support for the SN4600C A1 system. - How I did it Update the pointer of the hw-mgmt submodule Update the hw-mgmt version number Remove the staled code patch to hw-mgmt userspace code. - How to verify it Run platform regression on Mellanox platforms. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2021-12-28 09:40:58 +02:00
Stepan Blyshchak	bdf31a6556	[Mellanox][SDK] Build SDK with PRM sniffer support (#9500 ) - Why I did it To have an ability to use PRM sniffer. - How I did it Enabled the option in configure flags. - How to verify it Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-12-20 19:25:52 +00:00
Junchao-Mellanox	0197855d5d	[Mellanox] [202012] Allow user to set LED to orange (#9514 ) Backport https://github.com/Azure/sonic-buildimage/pull/9259 to 202012 #### Why I did it Nvidia platform API does not support set LED to orange. #### How I did it Allow user to set LED to orange #### How to verify it Manual test	2021-12-13 16:04:06 -08:00
Stephen Sun	acac848858	[Reclaim buffer][202012] Reclaim unused buffers by applying zero buffer profiles (#9063 ) - Why I did it Support zero buffer profiles 1. Add buffer profiles and pool definition for zero buffer profiles 2. Support applying zero profiles on INACTIVE PORTS 3. Enable dynamic buffer manager to load zero pools and profiles from a JSON file - How I did it Add buffer profiles and pool definition for zero buffer profiles If the buffer model is static: * Apply normal buffer profiles to admin-up ports * Apply zero buffer profiles to admin-down ports If the buffer model is dynamic: * Apply normal buffer profiles to all ports * buffer manager will take care when a port is shut down Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists Originally, all the macros to generate the above buffer objects took active ports only as an argument. Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added. To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called. Only vendors who support zero profiles need to change their buffer templates Enable buffer manager to load zero pools and profiles from a JSON file: The JSON file is provided on a per-platform basis It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created. To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files: One in Mellanox-SN2700-D48C8 for single ingress pool mode The other in ACS-MSN2700 for double ingress pool mode Those files of all other SKUs will be symbol link to the above files Update sonic-cfggen test accordingly: * Adjust example output file of JSON template for unit test * Add unit test in for Mellanox's new buffer templates. - How to verify it Regression test. Unit test in sonic-cfggen Run regression test and manually test. Signed-off-by: stephens <stephens@nvidia.com>	2021-12-09 17:34:56 +02:00
Volodymyr Samotiy	0831635b1c	[Mellanox] Update SDK to v4.4.3360 and FW to v2008.3358 (#9403 ) - Why I did it To include latest fixes. 1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays. 2. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior. 3. On rare occasions, when working with port rates of 1GbE or 10GbE and congestion occurs, packets may get stuck in the chip and may cause switch to hang. 4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped. 5. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times ( up to 70 seconds). 6. When connecting SN4600C to SN4600C after Fastboot in 50GbE No_FEC mode with a copper cable, the link up time may take ~20 seconds. - How I did it Updated SDK submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "soni-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2021-12-06 11:01:43 +02:00
Junchao-Mellanox	227f2f8aec	[Mellanox] Fan speed should not be 100% when PSU is powered off (#9258 ) - Why I did it When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%. - How I did it When PSU is powered of, don't treat it as absent. - How to verify it Adjust existing unit test case Add new case in sonic-mgmt	2021-12-01 02:28:37 +00:00
Junchao-Mellanox	d69564a1e7	[Mellanox] Change thermal recover threshold from temp_trip_norm to temp_trip_high (#8792 ) - Why I did it Change thermal recover threshold from temp_trip_norm to temp_trip_high, so that thermal algorithm would set fan speed to minimum allowed earlier and save power. - How I did it Change thermal recover threshold from temp_trip_norm to temp_trip_high - How to verify it Manual test	2021-10-05 22:17:30 +00:00
Nazarii Hnydyn	70b9ea5409	[Mellanox] Advance hw-mgmt to V.7.0010.2346. (#8667 ) Commits on Sep 01, 2021 hw-mgmt: attributes: Add PSU power sensor attributes d8fce39 Commits on Sep 02, 2021 Remove MFT package flint tool from hw-management dump generation. 53d06b2 hw-mgmt: debug: Add timeout to generate-dump.sh b661fa3 Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2021-09-09 12:03:44 +00:00
shlomibitton	c0f9bb9720	[202012] [Mellanox] Update SDK\FW to version 4.4.3326\2008.3326 (#8602 ) - Why I did it Update SDK\FW version to 4.4.3326\2008.3326. This version contains: New Features: 1. Add support for Fast Boot for SN3800 Bug Fixing: 1. In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3) 2. On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored. 3. On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash. 4. Some Molex cables do not support speed after reboot - How I did it Update submodule and .mk files - How to verify it Verified by running regression tests that includes complete sonic-mgmt tests supported Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2021-09-03 10:59:18 +03:00
Junchao-Mellanox	49f4ef6438	[Mellanox] Read PSU fan max/min speed per PSU (#8563 ) #### Why I did it New PSU could install different type of fan, so fan max/min speed should be read per PSU #### How I did it The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file #### How to verify it Manual test	2021-08-27 02:27:00 +00:00
Alexander Allen	196fcffb6f	[Mellanox] Upgrade Mellanox firmware tools to 4.17.0 (#8299 ) - Why I did it New release of MFT has the following changelog / RN Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot. Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation - How I did it Update makefile pointer to new version. - How to verify it Manually tested on all Mellanox platforms.	2021-08-23 03:05:20 +00:00
Junchao-Mellanox	8285cf2329	[Mellanox] [202012] Upgrade hw-mgmt to 7.0100.2344 (#8408 ) To support new PSU fan on Mellanox platforms	2021-08-11 02:04:55 -07:00
DavidZagury	0551fed754	[Mellanox][Pcie] Fix issue on pcied with an id that contains only decimal digits was treated as a decimal number (#8309 ) A device that contains only decimal digits was mistreated as a decimal integer resulting in failure to find it in the id to bus map.	2021-08-05 15:22:48 +00:00
DavidZagury	45e100b61b	[Mellanox][pcied] Ignore bus on pcie.yaml for Mellanox switches (#8063 ) Why I did it BIOS upgrade on rare cases cannot guarantee bus value remain the same on every BIOS release. Ignoring this field in order for pcied not to fail but still verify device id in a different way. The solution is future proof and will not require changes in code when new BIOS version is available How I did it Since bus is not a fixed value (it is determined by the bios version) we are ignoring this field, and instead checking if there is a device that match on all other fields that and in addition has a matching device id. How to verify it Verify no errors or failures in pcied on different BIOS version with the same code base.	2021-07-27 10:46:31 +00:00
Dror Prital	be6cd44ddf	Update SDK\FW to version 4.4.3222\2008.3224 (#8247 ) *Update SDK\FW Version to 4.4.3222\2008.3224. Signed-off-by: Dror Prital <drorp@nvidia.com>	2021-07-26 11:05:29 -07:00
tomer-israel	13a62666d9	[WARM-REBOOT] fix issue of watchdog on simx when executing warm-reboot command (#8132 ) - Why I did it to prevent python exception error when executing warm-reboot command on mellanox simulator platform - How I did it return None on the watchdog python script on cases that watchdog file is not exist - How to verify it warm-reboot is running well without the python error. error message will appear on log on these cases. in order to avoid this error message we can simulate the watchdog on mellanox simulator platform	2021-07-20 10:18:17 +00:00
Vivek Reddy	1b6634765c	SAI fix (#8142 ) [0e4f0b] Fix saisdkdump #### Why I did it Fix the saisdkdump failure when the vxlan src port flag is enabled in the sai.profile	2021-07-11 02:35:17 -07:00
Dror Prital	526dd3c4fb	[Mellanox] Update FW version to 2008.3218 (#8079 ) Update FW version to 2008.3218, fixing the following issues: - 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot - 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot Signed-off-by: Dror Prital <drorp@nvidia.com>	2021-07-07 09:41:35 +00:00
Dror Prital	fb89c28c95	[202012] [Mellanox] Update SDK\FW ver. 4.4.3216\2008.3216 (#8056 ) - Changes and new features: 1. Added support in SN4600C systems for new module Finisar ET7402-CWDM4 (100G CWDM4 QSFP28 1310nm SM 2KM). 2. Added support for new module MMS1W50-HM (2km transceiver FR4) for 200GbE 3. Improved performance of "per-port-buffer" counters 4. Added support for Kernel 5.10 - Bug fix: On rare occasions (0.5%), in SN4600C systems, when using 100GbE NRZ mode and Fastboot flow, the link up time may take up to 10 seconds Signed-off-by: Dror Prital <drorp@nvidia.com>	2021-07-06 07:31:34 +03:00
shlomibitton	b9d21a5779	Update SAI submodule (#7926 ) - Why I did it Split and bulk counter bug fixes: Init port auto neg to default on static (SAI XML) port split for 2nd+ port - How I did it Update submodule hash pointer. - How to verify it Verify the above is handled properly and reported issues are assumed to be fixed. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2021-06-23 20:44:33 +03:00
Junchao-Mellanox	ccb663c39b	[Mellanox] [202012] Backport 'Read EEPROM data from DB if possible'(7808) to 202012 (#7928 ) - Why I did it Remove EEPROM cache file and use DB instead - How I did it Read EEPROM data from DB if possible If data is not ready in DB, read from hardware using a visitor pattern - How to verify it Manual test and regression	2021-06-23 18:09:53 +03:00

1 2 3 4 5 ...

435 Commits