Commit Graph

1895 Commits

Author SHA1 Message Date
mssonicbld
3d99bf71b6
[mlnx-ffb.sh] Update issu-version location (#14925) (#15674) 2023-06-30 08:10:20 +08:00
mssonicbld
923da6fc84
[Mellanox] get LED capability from capability file (#14584) (#15664) 2023-06-30 00:26:23 +08:00
mssonicbld
f407a10c27
[Mellanox] Adjust warning threshold implementation according to the latest algorithm update (#15092) (#15665) 2023-06-30 00:25:50 +08:00
Samuel Angebault
19be1fa775
[202211][Arista] Update platform library submodules (#15407)
fix pcied leak on chassis
fix fan status led setting on fixed systems
misc fixes
2023-06-22 08:14:17 -07:00
Pavan-Nokia
776abb002a
[armhf][Nokia-7215]Add SFP refactor support for Nokia-7215 platform (#14789)
Why I did it
Add support for SFP refactor on Nokia-7215 Marvell armhf platform.

Platform: armhf-nokia_ixs7215_52x-r0
HwSKU: Nokia-7215
ASIC: marvell
Port Config: 48x1G + 4x10G (SFP+)

How I did it
Modify sfp.py to support SFP refactor optoe driver and platform.json to facilitate proper OC test completion.

How to verify it
Build armhf target for Nokia-7215 and verify proper Xcvrd and SFP refactor operation.
2023-06-22 08:12:37 -07:00
pavannaregundi
b8cd8d8e06 [Marvell] Update armhf driver version (#15138)
Changes in MRVL_PRESTERA_DRIVER_1.4:
- Memory leak fixed by releasing pci device after retrieval.
- Fixes for 5.10 kernel porting.

Change-Id: I1d7ee4ec02ec17a29ddb8473725ab68ca399748b

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-06-16 09:54:53 +08:00
Ikki Zhu
ea2e849607 [celestica/e1031]: enable emc2305 fan controller timeout feature (#14401)
Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.

How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.

How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.

Signed-off-by: Eric Zhu <erzhu@celestica.com>
2023-06-16 09:54:47 +08:00
Lior Avramov
d26850611f
[Mellanox] [202211] Remove iproute2 SDK patches from SONiC tree and consume them from SDK github (#15061)
Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.

How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.

How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-06-14 17:13:10 +08:00
StormLiangMS
8aeb2ba715
Cherrypick to 202211 [Mellanox] Add patch commit-id mapping to description #15416
cherry pick #15052
2023-06-10 13:58:12 +08:00
Junchao-Mellanox
af7412d3a1 [Mellanox] add PSU fan direction support (#14508)
- Why I did it
Add PSU fan direction support

- How I did it
Implement fan.get_direction for PSU fan

- How to verify it
Manual test
Unit test
2023-06-10 12:32:26 +08:00
Sudharsan Dhamal Gopalarathnam
d93970bc2e
[Mellanox] Update hw-mgmt to 7.0020.4301 (#15260) (#15283)
Manual Cherrypick of #15260

Why I did it
Bug fix:

I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.
Work item tracking
Microsoft ADO (number only):
How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer

How to verify it
run full sonic-mgmt regression
2023-06-01 11:41:59 +08:00
mssonicbld
d8f2f7c034
[Mellanox] Use sysfs for sfp reset/LPM/presence (#14130) (#15215) 2023-05-26 02:25:21 +08:00
mssonicbld
2098634ab3
[Mellanox] Update SAI to 2211.24.0.21 and SDK/FW to 4.5.5142/2010_5144 (#15072) (#15214) 2023-05-26 02:20:30 +08:00
mssonicbld
17771efecf
[armhf][Nokia-7215] changes fstrim.timer to daily (#14723) (#15125) 2023-05-18 07:12:24 +08:00
Song Yuan
01f61d1d29 Install ptf afpacket module required by ptf_nn_agent. (#14503)
Why I did it
ptf_nn_agent failed to start in dnx rpc syncd because module afpacket was not installed.
Please see issue sonic-net/sonic-mgmt#7822

How I did it
Add downloading ptf afpacket module in docker file.

How to verify it
Verified that ptf_nn_agent was started successfully in dnx rpc syncd with the change.
2023-05-18 06:32:29 +08:00
Samuel Angebault
59c7d39ef5
[202211][Arista] Update platform library submodules (#14828)
Fix watchdog reboot cause for wolverine linecard
Fix PSU fan speed of 0% by adding max RPM to most psu descriptions
Add product DCS-7060DX5-64
Add product DCS-7060DX5-32
2023-05-08 14:09:37 +08:00
Vivek
f49ae28948 [Mellanox] Fix the hw-mgmt intg tool case sensitivity for KConfig (#14709)
Fix the script to consider case sensitivity while writing the kconfig

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-06 12:32:19 +08:00
mssonicbld
10e635be93
[Mellanox] Facilitate automatic integration of new hw-mgmt (#14594) (#14966) 2023-05-06 09:08:54 +08:00
Lior Avramov
d7d8d7754d
[Mellanox] [202211] Replace iproute2 supplied by SDK to iproute2 downloaded from Debian repository (#14726) (#14724)
- Why I did it
Mellanox syncd container will be based on Debian iproute2 plus patches instead of Nvidia internal version of iproute2

- How I did it
Download iproute2 from Debian repository, apply patches and compile to create a new target.
The target is then deployed in syncd container of Mellanox switches only.
The new target is called IPROUTE2_MLNX.

- How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-05-02 10:29:02 +03:00
mssonicbld
e786108bbd
[Build] Fix the installation candidate not found issue when building docker-sonic-vs (#14439) (#14769) 2023-04-23 21:05:59 +08:00
Marty Y. Lok
1ce2b50143 [marvell-armhf][uboot-setting] Fix the print menu for marvell-armhf print menu on Nokia-7215 (#13933)
Why I did it
After sonic-install install a new image, print_menu is set echo without any data. No image info between Hit any key to stop autoboot:  0 and  Start USB

Board configuration detected:
Net:   
|  port  | Interface | PHY address  |
|--------|-----------|--------------|
No ethernet found.
Hit any key to stop autoboot:  0 

(Re)start USB...
USB0:   Port (usbActive) : 0    Interface (usbType = 2) : USB EHCI 1.00
scanning bus 0 for devices... 3 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
How I did it
The fw_setenv print_menu is missing the double quotes. That causes the value is truncated. Using double quotes to in the environment setting.

How to verify it
Install new image with this fix. And reboot the system. The following section should be shown:

Signed-off-by: mlok <marty.lok@nokia.com>
2023-04-21 02:33:03 +08:00
Hua Liu
51b60613f7 [S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. (#14402)
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. 

#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.

However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.

To avoid the false alert, improve the monitor to wait and re-check.

Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'

#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly:


```
admin@***:~$ sudo systemctl stop  serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.serviceserial-getty@ttyS1.service - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago

admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.serviceserial-getty@ttyS1.service - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```

syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```

#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.

#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
2023-04-21 02:32:56 +08:00
mssonicbld
6781c4a4fb
Made non-upstream patch design order aware (#14434) (#14650) 2023-04-14 03:29:35 +08:00
xumia
5dbf512cda
Support to add SONiC OS Version in device info (#14601) (#14623)
Why I did it
Cherry-pick #14601, for code conflict.
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.

SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Work item tracking
Microsoft ADO (number only): 17894593
How I did it
How to verify it
2023-04-13 19:28:03 +08:00
Jemston Fernando
8bbc8eb8cf
[celestica]: Fix Belgite platform issues (#14036)
As part of platform hardening this commit fixes several platform issues
in various components like PSU, FAN, Temperature, LED.
Cherrypick PR#13389
2023-03-27 10:16:16 -07:00
Sudharsan Dhamal Gopalarathnam
156189dbad [Mellanox]Fix lpmode set when logical port is larger than 64 (#14138)
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below

Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-19 20:50:58 +08:00
Junhua Zhai
29f3c4944a [gearbox] use credo sai v0.9.0 (#14149)
Update credo sai package to the latest v0.9.0.
2023-03-19 20:50:53 +08:00
Dror Prital
ba14f728de Update SDK/FW to version 4.5.4206/4.5.4204 (#14164)
- Why I did it
To include latest fixes:

Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog

- How I did it
Updated SDK submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".
2023-03-19 20:50:49 +08:00
dbarashinvd
d7ba89a95b [Mellanox] fix for watchdog device not found, adding dependency on hw-management (#14182)
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready

- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.

- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE
2023-03-19 20:50:44 +08:00
Volodymyr Samotiy
cc5ed4b632 [Mellanox] Update MFT to 4.22.1-15 (#14133)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-03-19 18:33:57 +08:00
zitingguo-ms
3c312dec1c
Upgrade SAI xgs version to 8.4.0.2 and migrate to DMZ (#14119)
Why I did it
Update SAI xgs version to 8.4.0.2 and migrate xgs to DMZ repo.

How I did it
Update SAI xgs version in sai.mk.

How to verify it
Run the SONiC and SAI test with the8.4 SAI release pipeline.
2023-03-09 14:52:08 +08:00
Stepan Blyshchak
969166d769 [Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-08 13:50:18 +08:00
mssonicbld
aea96da04d
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710) (#13962) 2023-03-06 16:47:31 +08:00
mssonicbld
1757f53290
[Mellanox] update sdk/fw build procedure (#14025) (#14059) 2023-03-03 02:43:19 +08:00
mssonicbld
72f9f51287
[Seastone] fix dx010 qsfp eeprom data write issue (#13930) (#14032) 2023-03-01 19:28:38 +08:00
mssonicbld
18bc044179
Remove support to Mellanox SPC4 ASIC (#13932) (#13957) 2023-02-23 22:22:35 +08:00
mssonicbld
310827c26c
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847) (#13959) 2023-02-23 20:32:15 +08:00
mssonicbld
50aaf92590
[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792) (#13960) 2023-02-23 20:32:09 +08:00
Junchao-Mellanox
e8789a2e11 [Mellanox] Check system eeprom existence in a retry manner (#13884)
- Why I did it
On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it.

- How I did it
Waiting EEPROM creation in platform API up to 10 seconds.

- How to verify it
Manual test
2023-02-23 20:31:29 +08:00
mssonicbld
6a12ca9332
[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814) (#13931) 2023-02-22 22:14:09 +08:00
Pavan-Nokia
d7815f3229 add sfp get error description (#13275)
Why I did it
Command "sudo sfputil show error-status -hw" shows "OK (Not implemented)" in the output.

How I did it
Add a new SFP API get_error_description support in Nokia sonic-platform sfp.py module.

How to verify it
Run the new image and execute command "sudo sfputil show error-status -hw"
2023-02-22 18:36:56 +08:00
Stephen Sun
b0416a5c2c [Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372)
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305

- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service 

- How to verify it
Regression test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-20 14:38:53 +08:00
Stephen Sun
4f3b649f8e [Mellanox] Support per PSU slope value for PSU power threshold (#13757)
- Why I did it
Support per PSU slope value for PSU power threshold according to hardware team requirement

- How I did it
Pass the PSU number as a parameter when fetching the slope value of PSU.

- How to verify it
Running regression and manual test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-20 12:38:20 +08:00
Sudharsan Dhamal Gopalarathnam
a993fc205f [Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533)
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker

- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin

- How to verify it
Manual UT and new sonic-mgmt tests
2023-02-18 06:34:29 +08:00
Samuel Angebault
ef02c73a03
[202211][Arista] Update platform library submodules (#13872)
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
2023-02-17 13:51:42 -08:00
Pavan-Nokia
979e9a7d9d [armhf][Nokia-7215]High CPU caused by entropy.py (#13694)
Why I did it
High CPU utilization by entropy.py

How I did it
Remove entropy script as it does not work anymore and is no longer needed for bullseye(202205).
In Buster(202012) the max available poolsize (entropy_avail) for entropy is 4096 and our entropy.py script was based on this value. With the change in kernel to bullseye on 202205 this entropy poolsize was changed to 256 which also causes our script to fail.

This script was initially added to provide SW assistance to improve the system entropy value available early on in the Sonic boot sequence on buster.
On bullseye (Linux kernel 5.10) this is no longer needed as this feature has been improved.

How to verify it
run "top" command to check CPU usage.
2023-02-18 04:32:35 +08:00
mssonicbld
94e59a841e
[Mellanox] Enhance MFT make file to download source code from any valid URL (#13801) (#13868) 2023-02-18 02:14:00 +08:00
Volodymyr Samotiy
e849455742 [Mellanox] Update SDK/FW to 4.5.4150/2010.4150 (#13480)
- Why I did it
To include latest fixes and new functionality

SDK/FW
1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module.
2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
4. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error.
5. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
6. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work.
7. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly.
8. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event.
9. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-02-16 18:36:43 +08:00
Lior Avramov
e6b1ed366b [Mellanox] [ECMP calculator] Add script usage and more information to script description in help option (#13493)
Add script usage and more information to script description being printed in help option.

- Why I did it
Missing information in script description in help option.

- How I did it
Expand script description and add script usage.

- How to verify it
Run the script with -h option.
2023-02-16 18:36:36 +08:00
mssonicbld
8832ddd60b
[Mellanox] Improve FW upgrade logging (#13465) (#13681) 2023-02-12 23:53:33 +08:00