Commit Graph

2072 Commits

Author SHA1 Message Date
zitingguo-ms
05ae1fa285
upgrade xgs SAI version to 10.1.6.0 (#18055)
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2024-02-06 23:11:58 -08:00
Sudharsan Dhamal Gopalarathnam
fcef7d1095
[202311][Mellanox]Update SAI to 2311.26.0.28, SDK/FW to 4.6.2202/2012.2202 (#17975) 2024-02-01 17:53:49 -08:00
Junchao-Mellanox
d53fba12cb Fix error log while creating PSU thermal object (#17789)
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:

Jan  8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist

Jan  8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist

- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
2024-02-01 09:34:05 +08:00
Lior Avramov
4599f7aeaf [Nvidia] Update syncd docker to use python version 3 (#17735)
* Remove python2 from compilation of python-sdk-api

* Upgrade Python version in syncd RPC docker image to Python3
2024-01-31 10:32:20 +08:00
Ying Xie
388c3f5f90
[202311][sonic-utilities] Revert bgp suppress fib pending (#17915)
Revert "Revert "[202311] Revert bgp suppress fib pending" (#17882)"

This reverts commit 1ccee478e2.

sonic-utilities:
* be294f39 2024-01-25 | [202311] Revert bgp suppress fib pending (#3003) (HEAD -> 202311, github/202311) [Stepan Blyshchak]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2024-01-26 13:12:10 -08:00
Ying Xie
1ccee478e2
Revert "[202311] Revert bgp suppress fib pending" (#17882) 2024-01-23 08:43:43 -08:00
dbarashinvd
d7a77601e4 fix low polarity wrong value for hw_reset deassert and seek(0) before reading sysfs upon poll event (#17627)
* fix hw_reset low polarity (reverse values)

* move seek to beginning of sysfs fd before reading to resolve power_good
sysfs returns empty upon plug out cable
2024-01-23 09:38:50 +08:00
Stepan Blyshchak
b527372642
[202311] Revert bgp suppress fib pending (#17660)
* [FRR] Bring back patches required for FPM plugin

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

* [zebra] use fpm plugin instead of dplane_fpm_nl

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

* Revert BGP suppress FIB pending

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

---------

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-01-22 10:17:50 -08:00
Junchao-Mellanox
8d65e2c517 [Mellanox] Fix issues found for CMIS host management (#17637)
- Why I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization

- How to verify it
Manual test
Unit test
2024-01-20 06:32:58 +08:00
Junchao-Mellanox
0fbdc2b8ed [Mellanox] wait until hw-management watchdog files ready (#17618)
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.

- How I did it
adds a wait to make sure watchdog has been fully initialized.

- How to verify it
Manual test
sonic regression
2024-01-19 04:32:53 +08:00
Nazarii Hnydyn
1687a442de [frr]: Force disable next hop group support. (#17344)
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

Closes #17345

This W/A was proposed by Nvidia FRR team before the long term solution is ready.

Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
2024-01-18 14:36:23 +08:00
Junchao-Mellanox
56ba5b10b4 [Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2024-01-17 06:33:11 +08:00
Kebo Liu
cacf46ff86
[202311][Mellanox] Integrate HW-MGMT Version 7.0030.2008 (#17659)
* Intgerate HW-MGMT 7.0030.2008 Changes

 ## Patch List
* 0285-UBUNTU-SAUCE-mlxbf-gige-Fix-intermittent-no-ip-issue.patch :
* 0286-pinctrl-Introduce-struct-pinfunction-and-PINCTRL_PIN.patch :
* 0287-pinctrl-mlxbf3-Add-pinctrl-driver-support.patch :
* 0288-UBUNTU-SAUCE-gpio-mmio-handle-ngpios-properly-in-bgp.patch :
* 0289-UBUNTU-SAUCE-gpio-mlxbf3-Add-gpio-driver-support.patch :
* 0291-mlxsw-core_hwmon-Align-modules-label-name-assignment.patch :
* 0292-mlxsw-i2c-Limit-single-transaction-buffer-size.patch :
* 0293-mlxsw-reg-Limit-MTBR-register-records-buffer-by-one-.patch :
* 0296-UBUNTU-SAUCE-mmc-sdhci-of-dwcmshc-Add-runtime-PM-ope.patch :
* 0298-UBUNTU-SAUCE-mlxbf-ptm-use-0444-instead-of-S_IRUGO.patch :
* 0299-UBUNTU-SAUCE-mlxbf-ptm-add-atx-debugfs-nodes.patch :
* 0300-UBUNTU-SAUCE-mlxbf-ptm-update-module-version.patch :
* 0301-UBUNTU-SAUCE-mlxbf-gige-Fix-kernel-panic-at-shutdown.patch :
* 0302-UBUNTU-SAUCE-mlxbf-bootctl-support-SMC-call-for-sett.patch :
* 0303-UBUNTU-SAUCE-Add-BF3-related-ACPI-config-and-Ring-de.patch :
* 0306-dt-bindings-trivial-devices-Add-infineon-xdpe1a2g7.patch :
* 0307-leds-mlxreg-Add-support-for-new-flavour-of-capabilit.patch :
* 0308-leds-mlxreg-Remove-code-for-amber-LED-colour.patch :
* 0308-platform_data-mlxreg-Add-capability-bit-and-mask-fie.patch :
* 0309-hwmon-mlxreg-fan-Add-support-for-new-flavour-of-capa.patch :
* 0310-hwmon-mlxreg-fan-Extend-number-of-supporetd-fans.patch :
* 0317-platform-mellanox-Introduce-support-for-switches-equ.patch :
* 0318-mellanox-Relocate-mlx-platform-driver.patch :
* 0319-UBUNTU-SAUCE-mlxbf-tmfifo-fix-potential-race.patch :
* 0320-UBUNTU-SAUCE-mlxbf-tmfifo-Drop-the-Rx-packet-if-no-m.patch :
* 0321-UBUNTU-SAUCE-mlxbf-tmfifo-Drop-jumbo-frames.patch :
* 0322-UBUNTU-SAUCE-mlxbf-tmfifo.c-Amend-previous-tmfifo-pa.patch :
* 0323-mlxbf_gige-add-set_link_ksettings-ethtool-callback.patch :
* 0324-mlxbf_gige-fix-white-space-in-mlxbf_gige_eth_ioctl.patch :
* 0325-UBUNTU-SAUCE-mlxbf-bootctl-Fix-kernel-panic-due-to-b.patch :
* 0326-platform-mellanox-mlxreg-hotplug-Add-support-for-new.patch :
* 0327-platform-mellanox-mlx-platform-Change-register-name.patch :
* 0328-platform-mellanox-mlx-platform-Add-support-for-new-X.patch :

* [Mellanox] Don't populate arm64 Kconfig when integrating hw-mgmt

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>

* [Mellanox] Remove thermal zone related code and replace with new one

* Revert "Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)" (#16956)"

This reverts commit c2edc6f9d5.

* Update copyright header

Signed-off-by: Kebo Liu <kebol@nvidia.com>

---------

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Co-authored-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2024-01-16 08:33:50 -08:00
Junchao-Mellanox
0b511986ae
[202311][Mellanox] implement platform wait in python code (#17398) (#17719)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2024-01-16 08:31:33 -08:00
snider-nokia
bcdbaf1039 [Nokia][sonic-platform] Update Nokia sonic-platform submodule and device data (#17378)
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.

Why I did it
To address thermal logging deficiencies.

Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:

Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.

Modify the module.py reboot to support reboot linecard from Supervisor

 - Modify reboot to call _reboot_imm for single IMM card reboot
 - Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output

 - Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
 - Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
 - Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
 - Provide ability for IMM to disable all transceiver module TX at reboot time
 - Remove defunct xcvr-resync service
2024-01-10 12:35:13 +08:00
Junchao-Mellanox
767944d7da [Mellanox] Fix race condition while creating SFP (#17441)
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':

Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")

- How I did it
Add lock for creating SFP object

- How to verify it
Unit test
Manual Test
2024-01-09 14:34:47 +08:00
Junchao-Mellanox
8de7cb5988
[202311] [Mellanox] update asic and module temperature in a thread for CMIS management (#16955) (#17699)
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
2024-01-08 10:50:59 -08:00
mssonicbld
4060f5ce5b
[Mellanox] Remove EEPROM write limitation if it is software control (#17030) (#17694) 2024-01-07 13:16:25 +08:00
mssonicbld
fb7bad2d11
[Mellanox] Implement low power mode for cmis host management (#17159) (#17693) 2024-01-06 07:55:41 +08:00
Junchao-Mellanox
7368df7839
[Mellanox] Enable CMIS host management (#16846) (#17684)
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled

Co-authored-by: dbarashinvd <105214075+dbarashinvd@users.noreply.github.com>
2024-01-05 12:07:30 -08:00
Junchao-Mellanox
6d43d2f636 [Mellanox] Provide default implementation for sfp error description when CMIS host management is enabled (#17294)
- Why I did it
Provide a dummy implementation for SFP error description when CMIS host management is enabled. A future feature shall be raised to implement SFP error description for such mode.

- How I did it
if SFP is under software control, provide "Not supported" as error description
if SFP is under initialization, provide "Initializing" as error description

- How to verify it
unit test
2024-01-04 10:38:38 +08:00
Nazarii Hnydyn
49e96c3daa
[mellanox]: Disable MFT bash autocompletion. (#17543)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-21 09:45:42 -08:00
Arun Saravanan Balachandran
9dbb016ad8 [Dell] S6100 - Update EEPROM API serial_number_str to return service tag instead of serial number (#17440)
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239

How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.

How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
2023-12-15 09:37:01 +08:00
zitingguo-ms
bd15b77ba9 change branch name (#17267)
Why I did it
Upgrade xgs SAI to 10.1 version.

Work item tracking
Microsoft ADO (number only): 25931321
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qualification on 7050cx3/7260cx3:

7050cx3:
https://dev.azure.com/mssonic/internal/_build/results?buildId=425450&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=425449&view=results
7260cx3: https://elastictest.org/scheduler/testplan/656f2b2b617fb27e41557494?leftSideViewMode=detail&prop=status&order=ascending
2023-12-14 14:36:07 +08:00
Aravind-Subbaroyan
62429a2328
Update cisco-8000.ini (#17429)
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
2023-12-07 17:04:45 -08:00
zitingguo-ms
897a023637 Upgrade xgs SAI version to 8.4.31.0 (#17059)
Why I did it
Upgrade the xgs SAI version to 8.4.31.0 to include the following changes:

8.4.22.0: [SDK upgrade][CSP CS00012314723][SAI_BRANCH rel_ocp_sai_8_4] SID:bcmtmPfcDdrScan thread takes 100% CPU utilization
8.4.23.0: [SDK upgrade][CSP CS00012290176[SAI_BRANCH rel_ocp_sai_8_4] SDK-323160: bcm_l3_ecmp_member_add returns Table Full error while ISSU
8.4.24.0:
[SDK upgrade]Merge "[CSP NA][SAI_BRANCH rel_ocp_sai_8_4] SID: Software LinkScan Not Catching Short Local/Remote Fault Events" into hsdk_6.5.27_SAI_8.4.0_GA
[SDK upgrade][CSP NA][SAI_BRANCH rel_ocp_sai_8_4] SID: Software LinkScan Not Catching Short Local/Remote Fault Events
8.4.25.0: [SAI_BRANCH rel_ocp_sai_8_4]CLONE - SAI - 8.4 - _brcm_sai_cosq_stat_get errors for CPU queue 41
8.4.26.0: [CSP CS00012307911] Fixed incorrect CPU related SAI port obj encoding/decoding in most subsystems
8.4.27.0: [CSP CS00012309154] [TD3] SAI_STATUS_INVALID_PARAMETER on setting SAI_BUFFER_POOL_ATTR_SIZE, OA crash
8.4.28.0: [CSP CS00012315552] Excessive logging from _brcm_sai_acl_tbl_grp_mbr_migration
8.4.29.0: [CSP CS00012321369] Fix TH2 regression with MMU/pool size
8.4.30.0: [SDK upgrade][CSP CS00012316299][SAI_BRANCH rel_ocp_sai_8_4] L3 entry delete failed when SER error is present
8.4.31.0: [CSP CS00012307911] Revert and limit scope of previous change due to WB issue.
Work item tracking
Microsoft ADO (number only): 26021230
How I did it
Upgrade the SAI version in sai.mk file.

How to verify it
Run advanced reboot on TH2 and TD3:

https://dev.azure.com/mssonic/internal/_build/results?buildId=422024&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=423352&view=results
@saiarcot895 run warm reboot from 202012 to target image and they've passed
TH2: https://dev.azure.com/mssonic/internal/_build/results?buildId=423112&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
TH: https://dev.azure.com/mssonic/internal/_build/results?buildId=423119&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
TD3: https://dev.azure.com/mssonic/internal/_build/results?buildId=423074&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
2023-12-04 22:14:03 +00:00
Pavan-Nokia
451398f801 [Nokia-7215][armhf] Enable Watchdog service (#16612)
Enable CPUWDT service to enable watchdog
2023-12-04 22:14:03 +00:00
Kebo Liu
f96742fb98 [Mellanox] Revert LPM implementation to the old way (#17096)
- Why I did it
The current low power mode setting implementation requests the user to set the port to admin down first before toggling LP mode, this is not backward compatible, now revert it to the old way so that the user can toggle the LP mode regardless of the port admin status.

- How I did it
Revert the recent changes related to LPM in PR #14130 and #16545

- How to verify it
Run all sfputil and SFP platform API related tests on all the Mellanox platforms.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-04 22:14:02 +00:00
jfeng-arista
6dfaf5e293
[sonic-vs]: Add fabric port data for vs test, and start fabricmgrd in vs environment (#16791)
Add fabric port data for vs test, and start fabricmgrd in vs environment.

This PR depends on sonic-net/sonic-sairedis#1301

sonic-net/sonic-swss#2920 needs this one merge first.
2023-11-20 16:21:03 -08:00
Pavan Naregundi
307e39bde4
[Marvell-arm64] Add platform support for rd98DX35xx (#16874)
* [Marvell-arm64] Add platform support for rd98DX35xx

This change adds following two variants of rd98DX35xx board to arm64
build.

Board with CPU integrated into the 98DX35xx switching chip:

 Platform: arm64-marvell_rd98DX35xx-r0
 HwSKU: rd98DX35xx
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Board with external CN9131 CPU connected over PCI to 98DX35xx
switching chip:

 Platform: arm64-marvell_rd98DX35xx_cn9131-r0
 HwSKU: rd98DX35xx_cn9131
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Change-Id: I21dc9fe972417daaabb20a5bddf7779d72b7972e
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

* Add HWSKU for rd98DX35xx and rd98DX35xx_cn9131

This patch adds new HWSKU's for Marvell arm64 platforms rd98DX35xx
and rd98DX35xx_cn9131.

Change-Id: Id7c14f49f0e304335cc4ca73dcae52362c49d231
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

---------

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-11-20 09:43:02 -08:00
Stephen Sun
b93852d53d
[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584)
- Why I did it
Support running hw-management service on MSN4700 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it

- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-11-19 11:03:46 +02:00
Volodymyr Samotiy
672781e24a
[mlnx-fw-upgrade] Add FW reactivation in case 2 FW upgrades were done without reboot (#17092)
- Why I did it
In order to activate FW after it was upgraded need to perform reboot.
If reboot wasn't performed and user need to upgrade to another SONiC image then it will fail.
The reason for that is that during SONiC upgrade new FW should be installed but it will fail because previously installed FW wasn't activated.
In order to allow 2nd FW upgrade without reboot in-between need to reactivate FW image.
This change handles such flow.

Example of issue scenario:

User installed SONiC image on the switch
Then for some reason FW was upgraded by user or script but reboot was not performed to activate it.
After that upgrade to new SONiC image will fail because new image need to install FW but it fails due to previous one wasn't activated.

- How I did it
In "mlnx-fw-upgrade" script check if FW upgrade failed with the error that FW was already installed but reboot was not performed.
If so then perform FW image reactivation and try to upgrade FW again.

- How to verify it
Install SONiC image on the switch
Then upgrade FW but don't perform reboot.
After that upgrade to new SONiC image and check that upgrade was successfull.

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-11-19 11:01:31 +02:00
Samuel Angebault
c2899eb44c
[Arista] Update platform library submodules (#16701)
Why I did it

- Convert hw-dump into generate-dump plugins
- Enable DRAM scrubber on some products
- Fix xcvr driver active low register bit logic
- Improve cooling algorithm (now considers xcvrs and modules)
- Add linecard graceful shutdown (disabled by default)

The scrubber was enabled for the following products:

- DCS-7050QX-32S
- DCS-7050CX3-32S
- DCS-7060CX-32S
2023-11-17 17:15:39 -08:00
Junhua Zhai
4e3b2e5545
Upgrade libsaibroncos debian package to version 3.11 (#17127) 2023-11-09 10:15:02 -08:00
byu343
ed07dbad09
[knet]: Disable NETIF_F_HW_CSUM in KNET (#17080)
This is CSP CS00012280996.
The issue to fix is that the checksum was incorrect for all TCP packets leaving the system so that the BGP connection cannot be established. We found the issue on BCM56993, and it is possible to affect all platforms using linux_ngknet.
2023-11-02 16:17:06 -07:00
zitingguo-ms
2c0f4e57d7
Upgrade XGS saibcm-modules to 8.4 (#16246)
Why I did it
XGS saibcm-modules 8.4 is needed. #14471

Work item tracking
Microsoft ADO (number only): 24917414
How I did it
Copy files from xgs SDK 8.4 repo and modify makefiles to build the image.
Upgrade version to 8.4.0.2 in saibcm-modules.mk.

How to verify it
Build a private image and run full qualification with it: https://elastictest.org/scheduler/testplan/650419cb71f60aa92c456a2b
2023-10-26 18:58:34 +08:00
Junhua Zhai
e66ae597f9
[gearbox] use credo sai v0.9.3 (#16860)
Update credo sai package to the latest v0.9.3, which fixes the issue aristanetworks/sonic#92.
2023-10-25 11:58:50 -07:00
Samuel Angebault
9d3d4a8a03
Add some config options to make gbsyncd optional (#16840)
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.

Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.

How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
2023-10-25 15:39:03 +08:00
Junchao-Mellanox
c2edc6f9d5
Revert "[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)" (#16956)
This reverts commit 0846322e9a.
2023-10-23 11:55:27 +03:00
Vivek
6410e66f35
[Mellanox] Enhance the processing of Kconfig in the hw-mgmt integration (#16752)
- Why I did it
Add an ability to add arm64 mellanox specific kconfig using the integration tool
Fix the existing duplicate kconfig problem by using the vanilla .config
Add an ability to patch kconfig-inclusions file. Renamed series.patch to external-changes.patch to reflect the behavior
NOTE: Min hw-mgmt version to use with these changes: V.7.0030.2000 not yet upstream but required prio to it.
This option will be enabled one the new hw mgmt will be upstream.

Depends on sonic-net/sonic-linux-kernel#336

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-10-18 19:32:59 +03:00
Rajkumar-Marvell
357ab54e08
[Marvell] Updated SAI 1.13.0 amd64 debian (#16811)
Why I did it
Added Marvell SAI-1.13.0 debian support for x86_64 platform.

Work item tracking
Microsoft ADO (number only):
How I did it
compile marvel libsai.so (with SAI headers from version 1.13.0) and package it with version 1.13.0-1

How to verify it
2023-10-18 16:47:53 +08:00
Pavan Naregundi
add98b221b [Marvell-arm64]: Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.

Change-Id: Id43f61af2b050500775da66d058c2de78cb5ad15
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-10-12 02:07:36 -07:00
Pavan Naregundi
5c5e4c77f4 [Marvell-arm64] Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).

Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.

Change-Id: Id5b011e6bd67accf7b1579d91cb7affad464e916
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-10-12 02:07:36 -07:00
Ashwin Srinivasan
61683d9d64
Revert "Move /var/log to RAM for Mellanox SN2700, Nokia 7215 and Dell S6100 (#15077)" (#16775)
This reverts commit 05f326eed9.

Microsoft ADO 25355843:
2023-10-11 10:36:29 -07:00
zitingguo-ms
7f706329f8
upgrade xgs SAI version to 8.4.21.0 (#16805)
Upgrade the xgs SAI version to 8.4.21.0 to include the following changes:

8.4.21.0: [CSP CS00012316669][SAI_BRANCH rel_ocp_sai_8_4] FP destroy API behavior change to avoid traffic leaks
8.4.20.0: [CSP CS00012312900] Max path used as 0 in ordered ECMP replace.
8.4.19.0: [CSP CS00012301679] sai_query_attribute_capability SAI_OBJECT_TYPE_SWITCH, fix few attrs in previous checkin
8.4.18.0: [CSP CS00012310706] Add SAI_TUNNEL_SUPPORT to azure pipeline build files
8.4.16.0: [CSP CS00012301679] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
8.4.15.0: [SAI_BRANCH rel_ocp_sai_8_4] Port SONIC-75025 to SAI 8.4
8.4.14.0: [CSP CS00012306356] Change log level of sai_bulk_object_get_stats, unsupported object type to warning
8.4.13.0: [CSP CS00012302193] backport SONIC-72912 jira on SAI 8.4 branch
8.4.12.0: [CSP CS00012296541][SAI_BRANCH rel_ocp_sai_8_4] Preformance improvement for ECMP from SDK-354625
8.4.11.0: [CSP CS00012293985] Port SONIC-74816 fix to 8.4.
8.4.10.0: [CSP NA/SID-26013][SAI_BRANCH rel_ocp_sai_8_4] SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
8.4.9.0: [CSP NA/SID-25917][SAI_BRANCH rel_ocp_sai_8_4] SID-Crash in ALPM algorithm during entry split SDK-343694
8.4.8.0: [CSP CS00012275265][SAI_BRANCH rel_ocp_sai_8_4] SID Deadlock in linkscan callback during flexport operations
8.4.7.0: [CSP CS00012284142] Fixed MMU buffer config issue with multicast queues
8.4.6.0: [CSP CS00012275454] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER; [CSP CS00012284121] [SAI_BRANCH rel_ocp_sai_8_4] SID - L2_ENTRY Table Lookups May Miss
8.4.4.0: [CSP CS00012287462] Uplift tunnel fix from SONIC-73462
8.4.2.0: Fixing the issue with SAI_QUEUE_STAT_DROPPED_PACKETS retrieval; Enable/Disable bitmask for egress stats; SAI - OCP SAI 8.4 - SAI: Reduce Index data type union _brcm_sai_indexed_data_t size to be below 2k.; Cut Down Version - Port Tpid Compilation Issue Fix

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-10-10 09:59:15 -07:00
Junchao-Mellanox
0846322e9a
[Mellanox] Align PSU temperature sysfs node name with hw-management change (#16820)
- Why I did it
hw-management renamed PSU temperature related sysfs:

psu1_temp -> psu1_temp1
psu2_temp -> psu2_temp1
psu1_temp_max -> psu1_temp1_max
psu2_temp_max -> psu2_temp1_max
This PR is to align the change in SONiC.

- How I did it
Use new sysfs node for PSU temperature and PSU temperature threshold

- How to verify it
Manual test
sonic-mgmt Regression test
2023-10-10 19:21:27 +03:00
guxianghong
51570657eb
[centec] Upgrade SONiC centec-sai reference to v1.13.0-1 (#16767)
1. Upgrade Centec SAI debian package version to v1.13, in order to match syncd's requirement.
2. Fix syncd compile fail for missing sai_query_api_version function in verdor sai

Signed-off-by: Xianghong Gu <xgu@centec.com>
2023-10-04 22:24:43 -07:00
Junchao-Mellanox
aedffd333b
[Mellanox] wait reset cause ready (#16722)
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.

How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.

How to verify it
Manual test on master/202211/202205
2023-10-03 18:58:31 -07:00
Zhijian Li
3d6389b481
Add info syslog for cpu_wdt.service (#16678)
Why I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.

How I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
2023-09-25 20:59:44 -07:00
snider-nokia
5aea3a976c
[Nokia][sonic-platform] Update Nokia sonic-platform submodule - SFP support for CMIS CDB operations (#16572)
This fixes Nokia-ION/ndk#22
Note that this PR must be coupled with NDK version >= 22.9.13

Why I did it
To provide proper support for CMIS compliant transceiver module CDB operations (including FW related operations).

How I did it
Enhanced the transport subsystem so as to provide for up to 2k bytes of data to be passed to/from modules (as contrasted with the prior max of 128 bytes).

How to verify it
Ensure that new FW (firmware) can be programmed to CMIS compliant module(s) using the 'sfputil firmware ...' commands.
2023-09-23 14:09:02 -07:00