Commit Graph

2111 Commits

Author SHA1 Message Date
dbarashinvd
371f6a6835
add unit tests for CMIS host management feature (#18211)
* add unit tests for CMIS host management feature
2024-03-20 13:15:50 -07:00
Saikrishna Arcot
53d8b1a382
Fix debug package variables for syncd (#18319)
* Fix debug package variables for syncd

PR #16072 renamed the debug package variables from `*_DBG` to
`*_DBGSYM`, since the package names had changed. However, the references
weren't updated. Since all the other debug packages (including ones that
are named `*-dbgsym`) use `*_DBG`, just use that here as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Update sairedis.mk as well

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2024-03-19 10:24:57 -07:00
snider-nokia
9fdbdeee85
[Nokia][sonic-platform] Update Nokia sonic-platform submodule - ungraceful reboot hooks to induce NIF port shutdown (#18014)
These changes provide for the automatic shutdown of NIF ports on LC when an ungraceful reboot scenario occurs. Reboot and panic notifier hooks are now registered so that callback occurs from the kernel and NIF ports are subsequently shut down.

Why I did it
To facilitate the timely movement of traffic away from a crashed LC when its peers recognize that the associated links have gone down.

How I did it
Linux kernel reboot and panic notifier hooks are used to register a callback routine that, when invoked, stuffs all present transceiver modules into reset.

How to verify it
Cause an ungraceful reboot (whether via /usr/sbin/reboot or by causing a kernel panic) and verify that all LC native NIF links are brought down at reboot/panic time (on the way down). It may be necessary to monitor the LC link peer(s) in order to verify in real-time.
2024-03-15 12:18:45 -07:00
Pavan-Nokia
f10220d428
[Nokia-7215-A1][arm64]Update platform init files (#18266)
Why I did it
Update Nokia-7215-A1 platform to address UT and OC test failures.
Update platform init and build files

Microsoft ADO: 27111894

How I did it
Identify failed test cases from OC run on arm64-nokia_ixs7215_52xb-r0 (Nokia-7215-A1) platform and fix bugs

How to verify it
Build a Marvell-arm64 target for Nokia-7215-A1

Run this image on arm64-nokia_ixs7215_52xb-r0 and verify all dockers are up and test basic commands like:

show version
show platform summary
show platform syseeprom
show platform fan
show platform psustatus
show platform firmware status
show platform temperature
show platform ssdhealth
Verify ports are up using "show interface status" command

Run unit tests and OC test cases.
2024-03-08 08:53:46 -08:00
James An
d47fa10a5b
Cho 202311.main.0.1 (master) (#18281)
* Update cisco-8000.ini
2024-03-06 16:03:16 -08:00
Xincun Li
f886328897
[sn2700]: Add CPLD update. (#17376)
Why I did it
Porting #12173 to master, this will ensure all above 201911 version will have CPLD update files.

Microsoft ADO 25846069:

How I did it
Added Mellanox CPLD burn/refresh vme bundle for SN2700 platforms

How to verify it
Using update_firmware script to install private image that contains CPLD VME files with UPDATE_MLNX_CPLD_FW parameter.

Before update, the CPLD version was 15
admin@str2-msn2700-spy-1:~$ sudo fwutil show status
Chassis    Module    Component    Version                Description
---------  --------  -----------  ---------------------  ----------------------------------------
MSN2700    N/A       ONIE         2016.11-5.1.0012-9600  ONIE - Open Network Install Environment
                     SSD          0115-000               SSD - Solid-State Drive
                     BIOS         0ABZS017_01.01.213     BIOS - Basic Input/Output System
                     CPLD1        CPLD000085_REV1501     CPLD - Complex Programmable Logic Device
                     CPLD2        CPLD000043_REV0400     CPLD - Complex Programmable Logic Device
                     CPLD3        CPLD000000_REV0100     CPLD - Complex Programmable Logic Device
Do Update
admin@str2-msn2700-spy-1:/tmp$ sudo ./update_firmware sonic-mellanox-xincun-cpld.bin UPDATE_MLNX_CPLD_FW=1
Available space: 8101 MB
Warning: 'sonic_installer' command is deprecated and will be removed in the future
Please use 'sonic-installer' instead
Current FW version: SONiC-OS-20201231.110
Target FW version number: add-cpld-2.83464431-a0237f7aef
Target FW version: SONiC-OS-add-cpld-2.83464431-a0237f7aef
expr: non-integer argument
NOTICE: Reset Drop caches to index 1
Warning: 'sonic_installer' command is deprecated and will be removed in the future
Please use 'sonic-installer' instead
Image SONiC-OS-add-cpld-2.83464431-a0237f7aef is already installed. Setting it as default...
Command: grub-set-default --boot-directory=/host 0

Command: sync;sync;sync

Command: sleep 3

Done
NOTICE: sonic_installer install successfully
Mellanox platform is detected: x86_64-mlnx_msn2700-r0
Mellanox ASIC maintenance...
Mellanox ASIC firmware is up to date
Mellanox CPLD maintenance...
NOTICE: Copy Mellanox firmware upgrade utility
'/tmp/image-add-cpld-2.83464431-a0237f7aef-fs//usr/bin/mlnx-fw-upgrade.sh' -> '/usr/bin/mlnx-fw-upgrade.sh'
NOTICE: Copy Mellanox cpldupdate utility
'/tmp/image-add-cpld-2.83464431-a0237f7aef-fs//usr/bin/cpldupdate' -> '/usr/bin/cpldupdate'
Mellanox CPLD firmware upgrade is required. Installing compatible version...
Current CPLD firmware version: 15
Target CPLD firmware version: 20
NOTICE: Upgrade MLNX CPLD FW from 15 to 20
CPLD burn firmware file: /tmp/tmp.42DXmW1pQS/FUI000193_Burn_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme
CPLD refresh firmware file: /tmp/tmp.42DXmW1pQS/FUI000193_Refresh_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme
[/] CPLD update...                 Lattice Semiconductor Corp.

             ispVME(tm) V12.2 Copyright 1998-2012.

               Customized for Mellanox products.

Processing virtual machine file (/tmp/tmp.42DXmW1pQS/FUI000193_Burn_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme)......

Diamond Deployment Tool 3.12
CREATION DATE: Tue Sep 20 09:41:49 2022


[|] CPLD update...+=======+
| PASS! |
+=======+
Power cycle the device, then check CPLD version, it has changed to 20.
admin@str2-msn2700-spy-1:~$ sudo fwutil show status
Chassis    Module    Component    Version                Description
---------  --------  -----------  ---------------------  ----------------------------------------
MSN2700    N/A       ONIE         2016.11-5.1.0012-9600  ONIE - Open Network Install Environment
                     SSD          0115-000               SSD - Solid-State Drive
                     BIOS         0ABZS017_01.01.213     BIOS - Basic Input/Output System
                     CPLD1        CPLD000085_REV2000     CPLD - Complex Programmable Logic Device
                     CPLD2        CPLD000128_REV0600     CPLD - Complex Programmable Logic Device
                     CPLD3        CPLD000000_REV0000     CPLD - Complex Programmable Logic Device
2024-03-06 07:39:00 -08:00
Pavan-Nokia
d4ca86bf9d
[Nokia-7215-A1]Update Nokia-7215-A1 Platform (#18147)
1) Update Nokia-7215-A1 platform to address UT and OC test failures
2) Enable watchdog service
3) EZB files for SAI upgrade
2024-03-04 10:53:00 -08:00
Sudharsan Dhamal Gopalarathnam
0c6b143e00
[Mellanox]Adding dependency of libnl-route-3 for Mellanox SAI library (#18197)
- Why I did it
Adding explicit dependency of libnl-route-3 for Mellanox SAI library. This is required for the latest SAI library.

- How I did it
Modifying Make files

- How to verify it
Building with the changes.
2024-03-04 19:58:31 +02:00
Sasha
26bc08850b
Check if PSU files exists when getting psu_voltage properties (#17042)
- Why I did it
Error messages occured when trying to read PSU files on init:
ERR pmon#psud: Failed to read from file /var/run/hw-management/power/psu1_volt_out2_capability - FileNotFoundError(2, 'No such file or directory')

This can happen when the power cord is disconnected from the PSU, so some PSU files may be absent, e.g.:
/var/run/hw-management/power/psu2_volt_out2
/var/run/hw-management/power/psu2_volt_out2_capability

- How I did it
Check if a file exists for a specific PSU parameter If not, return None so we can't read the PSU file any further

- How to verify it
Disconnect power cord from PSU and power supply from system
Wait few minutes and then connect power supply to system without power cord
Check logs for errors

Signed-off-by: Oleksandra Bella <oleksandrab@nvidia.com>
2024-03-04 19:35:06 +02:00
noaOrMlnx
c2aa77c438
[Mellanox] Fix timing issue in lpmode change (#18223)
- Why I did it
Changing LPMODE timing is different between cables.
We want to add functionality to make sure LPMODE has changed.
For that, the wait_until utility is used and every 1 second (until timeout), it will check with lower-layers what is the current Lpmode.
Once it is the expected mode, set_lpmode() functino will return True.
If after seconds, Lpmode is still not in the expected mode, set_lpmode() function will return False.

- How I did it
Add use of wait_until function to make sure lpmode was changed.

- How to verify it
sfputil lpmode on
sfputil lpmode off
2024-03-04 16:38:51 +02:00
Kebo Liu
881ceb7034
[Mellanox] Extend the time to wait for EEPROM VPD file creation (#18146)
- Why I did it
The creation of system EEPROM VPD file "/var/run/hw-management/eeprom/vpd_info" is triggered by the udev event during the system boot up, in case the CPU is busy during the bootup, the udev event handling can be delayed, and need to wait for some more time for the file creation.

- How I did it
Extend the waiting time from 10s to 20s to overcome some extreme case.

- How to verify it
continuously run reboot case and verify whether still can see error msg "ERR decode-syseeprom: Nowhere to read syseeprom from! No symlink found"

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2024-02-29 15:46:49 +02:00
Pavan Naregundi
a4e7d065da
[Marvell-arm64] Update sonic-platform submodule (#17717)
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-28 11:43:48 -08:00
noaOrMlnx
b25dfa91c1
[Mellanox] Update Nvidia sai.profile SKU files to have common file (#18074)
* Update Nvidia sai.profile SKU files to have common file

* Remove SAI_DUMP_MFT_CFG_PATH from sai-common.profile as it is not in use
2024-02-28 11:05:20 -08:00
Pavan-Nokia
6511c3bc26
[Nokia-7215-T1] Disable sysrq-trigger from platform init (#18161) 2024-02-28 08:24:01 -08:00
Liu Shilong
5e23a6bc93
[build] Use public storage for public resources. (#18038) 2024-02-27 17:45:49 -08:00
zitingguo-ms
41aa3295b9
upgrade xgs SAI version to 10.1.7.0 (#18156)
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2024-02-22 08:56:03 -08:00
Lahav-Nvidia
8a7e38b3a3
[Mellanox] Add N/A as a valid fan direction for Nvidia platforms (#17930)
- Why I did it
On some Nvidia platforms, fan direction could not be determined. Therefore 'N/A' becomes a valid value for those cases.

- How I did it
Add 'N/A' to the valid fan direction mapping, to avoid an error in the log.

- How to verify it
Check fan direction on Nvidia platforms, and make sure there aren't errors in the log.
2024-02-22 11:35:10 +02:00
Pavan Naregundi
4b8f172b46
[marvell-armhf] Update MRVL_PRESTERA_DRIVER (#17780)
Changes in MRVL_PRESTERA_DRIVER_1.8:
 * Migrate dtb to kernel 6.1.
 * Fix i2c kernel error log,
	[ 51.331287] i2c i2c-0: mv64xxx: I2C bus locked, block: 1, time_left: 0.

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-16 08:51:49 -08:00
Pavan Naregundi
c6602c9585
[Marvell-arm64]: Fix SYNCD_RPC build (#17266)
Change-Id: I0bd4932d03141f3f7bc523b49a1bf3d1809817a8

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-12 15:11:19 -08:00
Pavan Naregundi
b31a3030fb
[Marvell-arm64] Fix boot issue on rd98DX35xx_cn9131 (#17277)
Change-Id: I411f12963fb8dc0eb3569faf4df68082b852e3a8

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-12 15:11:00 -08:00
dbarashinvd
7a34d4a275
[Mellanox] fix code for warm reboot to work with FW controlled ports (#18065)
- Why I did it
Fix the code to work also after warm reboot to work with FW controlled ports.
In warm reboot the control state sysfs of each port does not change unlike reboot or fast boot.

- How I did it
1. Check procfs cmdline if warm reboot done this is due to the fact pmon don't recognize warm reboot when it's taking place since pmon is loaded after warm reboot is finished.
2. If warm reboot done, check in static detection part for each port if it's FW controlled. If so, leave it this way and stop the state machine flow (set it to final state).

- How to verify it
1. Boot a switch with CMIS host management with at least one FW controlled port (non active cables or non cmis cables) then run warm reboot.
2. Verify no errors of sysfs reading appears for control sysfs
2024-02-08 14:49:56 +02:00
zitingguo-ms
74494010e1
[Broadcom] Upgrade xgs SAI to 10.1.6.0 (#18044)
Why I did it
Upgrade the xgs SAI version to 10.1.6.0 to include the following fix:

10.1.6.0: [CS00012332630][SAI_BRANCH rel_ocp_sai_10_1] SAI - OTHER - [SAI BUG] sflow use psample to send packet, but the psample in linux version is not right.
10.1.4.0: [CS00012329827]ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
10.1.3.0: Double commit test code fixes in EM for 10.1.
10.1.2.0: fix ODP packaging in rel_ocp_sai_10_1
10.1.1.0: Use knet-cb procfs path for DNX port speed sampling rate (does not use new genl)
Work item tracking
Microsoft ADO (number only): 26720003
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qual on s6100 T1: https://elastictest.org/scheduler/testplan/65c1c2e69e3e72f540cae34b
2024-02-07 09:29:40 +08:00
Yevhen Fastiuk
2f35079979
[Mellanox] Fix uninitialized variable on module plug event (#17011)
- Why I did it
To fix uninitialized variable

- How I did it
Add initial value

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2024-02-05 19:41:16 +02:00
dbarashinvd
0aacc1f28e
[Mellanox] fix sysfs reading that gets garbage end of line using strip (#17830)
- Why I did it
when reading sysfs fd upon python poller events, there's end of line garbage like "# 012" (without space between the 2 parts) trailing the real value of 1 or 0

- How I did it
using python strip() to remove end of line

- How to verify it
run the CMIS host management feature on a switch
wait few minutes until switch completes boot up sequence including CMIS host manager
then disconnect or reconnect a port to create a poller event
2024-02-05 19:39:55 +02:00
Stepan Blyshchak
e1a8d2a6e8
[nvidia][syncd] fix incorrect permission of /tmp in syncd container (#17777)
Fixes #16034
2024-02-05 00:00:29 -08:00
wenyiz2021
892f171b80
[Master] [DNX SAI] Update DNX SAI to 9.2.X and SDK on master branch (#17935)
SAI 9.2.x was sanitized and posted on 202305 branch: https://github.com/sonic-net/sonic-buildimage/pull/17432/files

Posting SAI 9.2.x to master branch also.

26607678
2024-02-01 17:44:48 -08:00
Dror Prital
4af43dc63b
[Mellanox] Update SIMX version to 23.10-1123 (#17958)
- Why I did it
Update NVIDIA SIMX Version to 23.10-1123

- How I did it
Changed fw.mk file
2024-01-31 19:41:23 +02:00
Sudharsan Dhamal Gopalarathnam
77384494b3
[Mellanox]Update SDK/FW to 4.6.2202/2012.2202 (#17947)
- Why I did it
Update SDK/FW version to 4.6.2202/2012.2202

Fixed issues:
1. On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
2. In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
3. On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
4. When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place

- How I did it
Updating make files.

-How to verify it
Running regression
2024-01-31 08:35:16 +02:00
Lior Avramov
865042ed23
[Nvidia] Update syncd docker to use python version 3 (#17735)
* Remove python2 from compilation of python-sdk-api

* Upgrade Python version in syncd RPC docker image to Python3
2024-01-30 13:47:39 -08:00
xumia
bb5a420de5
[Build] Fix krb5 package not found issue (#17926)
Why I did it
Fix the build issue caused by the wrong version specified.

See the build error logs:

Try 4: /usr/bin/wget --retry-connrefused failed to get: -O
--2024-01-26 11:38:23--  https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.10/amd64/libk5crypto3_1.18.3-6+deb11u14+fips_amd64.deb
Resolving sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)... 20.60.59.131
Connecting to sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)|20.60.59.131|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-01-26 11:38:23 ERROR 404: The specified blob does not exist..

Try 5: /usr/bin/wget --retry-connrefused failed to get: -O
make[1]: *** [Makefile:12: /sonic/target/debs/bullseye/symcrypt-openssl_0.10_amd64.deb] Error 8
make[1]: Leaving directory '/sonic/src/sonic-fips'
Work item tracking
Microsoft ADO (number only): 26577929
The package not installed but PR passed issue is traced in another issue #17927

How I did it
Add the libkrb5-dev and the depended packages to fix docker-sonic-vs build failure.
The package libzmq3-dev has dependency on the libkrb5-dev.
2024-01-30 21:44:32 +08:00
dbarashinvd
927dde73f1
fix low polarity wrong value for hw_reset deassert and seek(0) before reading sysfs upon poll event (#17627)
* fix hw_reset low polarity (reverse values)

* move seek to beginning of sysfs fd before reading to resolve power_good
sysfs returns empty upon plug out cable
2024-01-22 10:53:55 -08:00
Junchao-Mellanox
91d77fe7ae
Fix error log while creating PSU thermal object (#17789)
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:

Jan  8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist

Jan  8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist

- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
2024-01-22 16:22:07 +02:00
Liu Shilong
e30782b0fe
[ci] Enable cache for marvell-arm64 build in PR checks. (#15449)
Why I did it
Enable build cache for marvell-arm64 build to decrease PR check duration.

Work item tracking
Microsoft ADO (number only): 26340500
How I did it
How to verify it
2024-01-09 20:28:31 +08:00
snider-nokia
98f24b639e
[Nokia][sonic-platform] Update Nokia sonic-platform submodule and device data (#17378)
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.

Why I did it
To address thermal logging deficiencies.

Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:

Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.

Modify the module.py reboot to support reboot linecard from Supervisor

 - Modify reboot to call _reboot_imm for single IMM card reboot
 - Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output

 - Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
 - Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
 - Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
 - Provide ability for IMM to disable all transceiver module TX at reboot time
 - Remove defunct xcvr-resync service
2024-01-08 11:38:46 -08:00
Junchao-Mellanox
ee49d0dfec
[Mellanox] Fix issues found for CMIS host management (#17637)
- Why I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How to verify it
Manual test
Unit test
2024-01-04 09:42:33 +02:00
Junchao-Mellanox
7d388cd0e6
[Mellanox] wait until hw-management watchdog files ready (#17618)
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.

- How I did it
adds a wait to make sure watchdog has been fully initialized.

- How to verify it
Manual test
sonic regression
2023-12-26 18:27:18 +02:00
Junchao-Mellanox
d8a1ffbace
[Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2023-12-17 08:02:47 +02:00
Junchao-Mellanox
c1cb292310
[Mellanox] implement platform wait in python code (#17398)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2023-12-14 12:04:24 +02:00
Junchao-Mellanox
f373a16e95
[Mellanox] Fix race condition while creating SFP (#17441)
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':

Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")

- How I did it
Add lock for creating SFP object

- How to verify it
Unit test
Manual Test
2023-12-14 12:01:11 +02:00
zitingguo-ms
6a9ec987b5
change branch name (#17267)
Why I did it
Upgrade xgs SAI to 10.1 version.

Work item tracking
Microsoft ADO (number only): 25931321
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qualification on 7050cx3/7260cx3:

7050cx3:
https://dev.azure.com/mssonic/internal/_build/results?buildId=425450&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=425449&view=results
7260cx3: https://elastictest.org/scheduler/testplan/656f2b2b617fb27e41557494?leftSideViewMode=detail&prop=status&order=ascending
2023-12-14 09:37:35 +08:00
Junchao-Mellanox
1b84f3daa5
[Mellanox] update asic and module temperature in a thread for CMIS management (#16955)
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
2023-12-13 14:19:44 +02:00
Junchao-Mellanox
0d62cf0e92
[Mellanox] Remove EEPROM write limitation if it is software control (#17030)
- Why I did it
When module is under software control (CMIS host management enabled), EEPROM should be controlled by software and there should be no limitation for any write operation.

- How I did it
Remove EEPROM write limitation if a module is under software control

- How to verify it
Manual test
UT
2023-12-13 14:16:40 +02:00
Sudharsan Dhamal Gopalarathnam
dd39dd0e03
[Mellanox] Update SAI to 2311.26.0.28, SDK/FW to 4.6.2134/2012.2134 (#17481)
- Why I did it
Update SAI version to SAIBuild2311.26.0.28

Fixed issues
1. Traffic with unicast destination ip and multicast destination mac wasn't properly dropped
2. When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
3. Optional feature of Port IP counters (SAI_PORT_STAT_IP*) , enabled by SAI XML per-port-ip-counter-enabled config node, wasn't initialized properly.
4. Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
5. The default value for port FEC during switch init for Spectrum3 was initialized as 'auto' and not aligned to SAI header default 'none'. Note if setups has invalid configuration and relied previously on auto, now it might be necessary for the user to provide explicit valid value for SAI_PORT_ATTR_FEC_MODE

Update SDK/FW version to 4.6.2134/2012.2134
Fixed issues:
1. Updated SN3700C to enable limit to 100G speed.
2. Recovering from Low power mode might ends with port down.

- How I did it
Updating the versions in makefile

- How to verify it
Confirm issues fixed and run sonic-mgmt tests
2023-12-13 12:48:49 +02:00
Junchao-Mellanox
b0bb3d40d3
[Mellanox] Implement low power mode for cmis host management (#17159)
- Why I did it
For cmis host management mode, the prevous sysfs cannot be used for low power mode setting. This PR reuses existing low power mode implementation in sonic_xcvr package when CMIS host management mode is enabled

- How I did it
Use sonic_xcvr low power mode implementation when CMIS host management mode is enabled.

- How to verify it
Manual test for CMIS host management mode
Regression test for old mode and backward compatible test
2023-12-11 10:42:01 +02:00
Nazarii Hnydyn
278a958517
[Mellanox] Disable MFT bash autocompletion (#17442)
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.

- Why I did it
To overcome SN2700 20 sec delay on login

- How I did it
Removed MFT bash autocompletion part

- How to verify it
1. Build a mellanox image
2. Verify no such links after system boot.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-10 10:28:32 +02:00
Aravind-Subbaroyan
b222c7c240
Update cisco-8000.ini (#17428)
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
2023-12-07 17:05:05 -08:00
Arun Saravanan Balachandran
80e743716c
[Dell] S6100 - Update EEPROM API serial_number_str to return service tag instead of serial number (#17440)
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239

How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.

How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
2023-12-07 10:08:42 -08:00
centecqianj
8ec4b53451
[Bookworm] Upgrade centec-arm64 platform to Bookworm. (#17411)
Why I did it
1. Upgrade centec-arm64 platform to Bookworm.
2. Solve the problem of compiling the docker-syncd-centec-rpc.gz error on the centec platform.

How I did it
1. Modified platform driver to comply with bookworm kernel.
2. Upgrade SONiC package versions of the centec platform.

How to verify it
1. Compile the centec-arm64 platform to generate sonic-centec-arm64.bin.
2. Compile the centec platform to generate docker-syncd-centec-rpc.gz.

Signed-off-by: centecqianj <qianj@centec.com>
2023-12-07 08:42:13 -08:00
dbarashinvd
000a2ef818
[Mellanox] Enable CMIS host management (#16846)
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
2023-12-07 14:54:56 +02:00
Nazarii Hnydyn
1ff27db42f
[frr]: Force disable next hop group support. (#17344)
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

Closes #17345

This W/A was proposed by Nvidia FRR team before the long term solution is ready.

Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
2023-12-06 11:09:54 +08:00