Commit Graph

1863 Commits

Author SHA1 Message Date
mssonicbld
aea96da04d
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710) (#13962) 2023-03-06 16:47:31 +08:00
mssonicbld
1757f53290
[Mellanox] update sdk/fw build procedure (#14025) (#14059) 2023-03-03 02:43:19 +08:00
mssonicbld
72f9f51287
[Seastone] fix dx010 qsfp eeprom data write issue (#13930) (#14032) 2023-03-01 19:28:38 +08:00
mssonicbld
18bc044179
Remove support to Mellanox SPC4 ASIC (#13932) (#13957) 2023-02-23 22:22:35 +08:00
mssonicbld
310827c26c
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847) (#13959) 2023-02-23 20:32:15 +08:00
mssonicbld
50aaf92590
[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792) (#13960) 2023-02-23 20:32:09 +08:00
Junchao-Mellanox
e8789a2e11 [Mellanox] Check system eeprom existence in a retry manner (#13884)
- Why I did it
On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it.

- How I did it
Waiting EEPROM creation in platform API up to 10 seconds.

- How to verify it
Manual test
2023-02-23 20:31:29 +08:00
mssonicbld
6a12ca9332
[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814) (#13931) 2023-02-22 22:14:09 +08:00
Pavan-Nokia
d7815f3229 add sfp get error description (#13275)
Why I did it
Command "sudo sfputil show error-status -hw" shows "OK (Not implemented)" in the output.

How I did it
Add a new SFP API get_error_description support in Nokia sonic-platform sfp.py module.

How to verify it
Run the new image and execute command "sudo sfputil show error-status -hw"
2023-02-22 18:36:56 +08:00
Stephen Sun
b0416a5c2c [Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372)
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305

- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service 

- How to verify it
Regression test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-20 14:38:53 +08:00
Stephen Sun
4f3b649f8e [Mellanox] Support per PSU slope value for PSU power threshold (#13757)
- Why I did it
Support per PSU slope value for PSU power threshold according to hardware team requirement

- How I did it
Pass the PSU number as a parameter when fetching the slope value of PSU.

- How to verify it
Running regression and manual test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-20 12:38:20 +08:00
Sudharsan Dhamal Gopalarathnam
a993fc205f [Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533)
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker

- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin

- How to verify it
Manual UT and new sonic-mgmt tests
2023-02-18 06:34:29 +08:00
Samuel Angebault
ef02c73a03
[202211][Arista] Update platform library submodules (#13872)
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
2023-02-17 13:51:42 -08:00
Pavan-Nokia
979e9a7d9d [armhf][Nokia-7215]High CPU caused by entropy.py (#13694)
Why I did it
High CPU utilization by entropy.py

How I did it
Remove entropy script as it does not work anymore and is no longer needed for bullseye(202205).
In Buster(202012) the max available poolsize (entropy_avail) for entropy is 4096 and our entropy.py script was based on this value. With the change in kernel to bullseye on 202205 this entropy poolsize was changed to 256 which also causes our script to fail.

This script was initially added to provide SW assistance to improve the system entropy value available early on in the Sonic boot sequence on buster.
On bullseye (Linux kernel 5.10) this is no longer needed as this feature has been improved.

How to verify it
run "top" command to check CPU usage.
2023-02-18 04:32:35 +08:00
mssonicbld
94e59a841e
[Mellanox] Enhance MFT make file to download source code from any valid URL (#13801) (#13868) 2023-02-18 02:14:00 +08:00
Volodymyr Samotiy
e849455742 [Mellanox] Update SDK/FW to 4.5.4150/2010.4150 (#13480)
- Why I did it
To include latest fixes and new functionality

SDK/FW
1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module.
2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
4. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error.
5. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
6. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work.
7. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly.
8. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event.
9. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-02-16 18:36:43 +08:00
Lior Avramov
e6b1ed366b [Mellanox] [ECMP calculator] Add script usage and more information to script description in help option (#13493)
Add script usage and more information to script description being printed in help option.

- Why I did it
Missing information in script description in help option.

- How I did it
Expand script description and add script usage.

- How to verify it
Run the script with -h option.
2023-02-16 18:36:36 +08:00
mssonicbld
8832ddd60b
[Mellanox] Improve FW upgrade logging (#13465) (#13681) 2023-02-12 23:53:33 +08:00
mssonicbld
956173856c
[sflow]: Unblocked psample_*() function calls in BRCM ESW platforms for proper functionality of sflow feature (#12918) (#13691) 2023-02-11 12:35:41 +08:00
Junhua Zhai
200342261a [gearbox] use credo sai v0.8.2 (#13565)
Update credo sai package to the latest v0.8.2, which also has the fix for aristanetworks/sonic#52.
2023-02-07 04:32:28 +08:00
mssonicbld
d9b15aea0d
[Seastone] Enhancement fix for PR12200 syseeprom issue (#13344) (#13664) 2023-02-05 01:22:04 +08:00
Ikki Zhu
62fb0726ee [Platform/Seastone]: fix syseeprom tlv read issue (#12200)
Why I did it
Fix Seastone syseeprom tlv header read incorrect issue

How I did it
Set mux idle_state

How to verify it
i2cdump -y -f 12 0x50 i
2023-02-04 04:32:29 +08:00
Vadym Hlushko
3530fdbea1 [SFP] Change logging severity when failed to read EEPROM (#13011)
- Why I did it
In order to prevent the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py test failing on the log analyzer step.

The mentioned test is performing the sfputil reset EthernetX for every interface on the SONiC switch, this action will flap the SFP device status (INSTERTED -> REMOVED -> INSTERTED).

The SONiC XCVRD daemon will catch this SFP device status change (because it is monitoring the presence status of the cable).
To judge the cable presence status, currently, we are still leveraging to read the first bytes of the EEPROM, and the EEPROM could be not ready at some moment and the SONiC XCVRD daemon will print the error log to Syslog:

ERR pmon#xcvrd: Error! Unable to read data for 'xx' port, page 'xx' offset 128, rc = 1, err msg: Sending access register

- How I did it
Change logging severity from ERR to WARNING

- How to verify it
Run the sonic-mgmt/tests/platform_tests/sfp/test_sfputil.py

OR much faster way to run the next script on the switch:

#!/bin/bash

START=0
END=248

for (( intf=$START; intf<=$END; intf+=8))
do
    sfputil reset Ethernet"${intf}"
done

sfputil show presence
2023-02-04 02:36:51 +08:00
Junchao-Mellanox
cf6f31b215 [Mellanox] Remove TODO comments which are no longer needed (#13023)
- Why I did it
Remove TODO comments which are no longer needed

- How I did it
Remove TODO comments which are no longer needed

- How to verify it
Only comment change
2023-02-04 02:36:47 +08:00
Kebo Liu
9680479661 [Mellanox] change the implementation of is_host() to fix a stuck issue on simx platform (#13100)
- Why I did it
Following code to judge whether a process is running inside a docker could get stuck on the simx platform

subprocess.Popen(["docker", "--version"],
                                stdout=subprocess.PIPE,
                                stderr=subprocess.STDOUT,
                                universal_newlines=True)
When it gets stuck, the config-chassisdb service can not be successfully started, thus the system can not be booted up.

root@sonic:/# service config-chassisdb status
     config-chassisdb.service - Config chassis_db
     Loaded: loaded (/lib/systemd/system/config-chassisdb.service; enabled; vendor preset: enabled)
     Active: activating (start) since Thu 2022-12-15 09:23:02 UTC; 29min ago
   Main PID: 571 (config-chassisd)
      Tasks: 14 (limit: 9501)
     Memory: 132.4M
     CGroup: /system.slice/config-chassisdb.service
                        ├─571 /bin/bash /usr/bin/config-chassisdb
			├─575 /usr/bin/python3 /usr/local/bin/sonic-cfggen -H -v DEVICE_METADATA.localhost.platform
			├─602 /bin/sh -c sudo decode-syseeprom -m
			├─603 sudo decode-syseeprom -m
			├─607 /usr/bin/python3 /usr/local/bin/decode-syseeprom -m
			├─616 /bin/sh -c docker --version 2>/dev/null
			└─617 docker --version

- How I did it
Use an alternative way to implement this function and issue can be avoided:

docker_env_file = '/.dockerenv'
return os.path.exists(docker_env_file) is False

- How to verify it
run regression on real hardware and simx platform.
2023-02-04 02:36:43 +08:00
Yoush
d59b43566f [centec]: reference to v1.11.0-1 sai debian package for master (#13206) 2023-02-04 02:36:38 +08:00
Kebo Liu
ab54549d53 [Mellanox] Skip the leftover hardware reboot cause in case of last boot is warm/fast reboot (#13246)
- Why I did it
In case of warm/fast reboot, the hardware reboot cause will NOT be cleared because CPLD will not be touched in this flow. To not confuse the reboot cause determine logic, the leftover hardware reboot cause shall be skipped by the platform API, platform API will return the 'REBOOT_CAUSE_NON_HARDWARE' instead of the "hardware" reboot cause.

- How I did it
Check the proc cmdline to see whether the last reboot is a warm or fast reboot, if yes skip checking the leftover hardware reboot cause.

- How to verify it
a. Manual test:
    - Perform a power loss
    - Perform a warm/fast reboot
    - Check the reboot cause should be "warm-reboot" or "fast-reboot" instead of "power loss"
b. Run reboot cause related regression test.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-01-31 18:34:36 +08:00
Junchao-Mellanox
e631f426f4
[infra] Support syslog rate limit configuration (#12490) (#13535)
Backport of https://github.com/sonic-net/sonic-buildimage/pull/12490 into 202211

- Why I did it
Support syslog rate limit configuration feature

- How I did it
Remove unused rsyslog.conf from containers
Modify docker startup script to generate rsyslog.conf from template files
Add metadata/init data for syslog rate limit configuration

- How to verify it
Manual test
New sonic-mgmt regression cases
2023-01-30 20:11:44 +02:00
Dror Prital
d12c3b79bc
[202211][Mellanox] Add ASIC simulation version tag to fw.mk (#13473)
Signed-off-by: dprital <drorp@nvidia.com>
2023-01-23 13:28:19 +02:00
mssonicbld
1dc71aa4ff
[Mellanox] Update ECMP calculator README (#13051) (#13362) 2023-01-14 11:46:42 +08:00
mssonicbld
7524e91aa1
The FAN driver framework module complies with s3ip sysfs specification (#12888) (#13212)
Why I did it
Provide a Fan driver framework that complies with s3ip sysfs specification

How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node

How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification

Co-authored-by: tianshangfei <31125751+tianshangfei@users.noreply.github.com>
2023-01-09 14:24:41 +08:00
mssonicbld
ab0533e646
two platforms supporting S3IP SYSFS (TCS8400, TCS9400) (#12386) (#13210)
Why I did it
Add two platform that support s3IP framework

How I did it
Add two platforms supporting S3IP SYSFS (TCS8400, TCS9400)

How to verify it
Manual test

Co-authored-by: tianshangfei <31125751+tianshangfei@users.noreply.github.com>
2023-01-09 11:40:35 +08:00
mssonicbld
1e522ff3a9
Add ECMP calculator tool (#12482) (#13301) 2023-01-09 00:48:56 +08:00
Richard.Yu
fb6f0b53ba
[SAIServer]Upgrade SAI server init script (#13175) (#13227)
Why I did it
why
In order to apply different config across different platform, and use the code with a unified format, reuse syncd init script to init saiserver.

How I did it
how
Reuse syncd init script

How to verify it
Test
Test in DUT s6000 and dx010 with sonic 202205
2023-01-03 16:03:05 +08:00
mssonicbld
79b0890c53
The user framework module complies with s3ip sysfs specification (#12894) (#13215) 2023-01-01 12:35:32 +08:00
mssonicbld
684b07f172
The demo driver complies with s3ip sysfs specification,which use the s3ip kernel framework (#12895) (#13214) 2023-01-01 12:35:11 +08:00
mssonicbld
4ac8359854
The CPLD and FPGA driver framework module complies with s3ip sysfs specification (#12891) (#13218) 2023-01-01 12:34:50 +08:00
mssonicbld
313406a290
The build project of s3ip frameworkk (#12896) (#13213) 2023-01-01 12:32:42 +08:00
mssonicbld
967cc38356
The PSU driver module complies with s3ip sysfs specification (#12887) (#13211) 2023-01-01 12:32:36 +08:00
mssonicbld
fe5732a4cc
The slot and switch_rootsysfs driver framework module complies with s3ip sysfs specification (#12893) (#13216) 2023-01-01 12:28:41 +08:00
mssonicbld
5489913baf
The Sensor driver framework module complies with s3ip sysfs specification (#12890) (#13219) 2023-01-01 12:27:55 +08:00
mssonicbld
29e7348c7b
The Transceiver driver framework module complies with s3ip sysfs specification (#12889) (#13220) 2023-01-01 12:26:52 +08:00
mssonicbld
8552b92b98
The LED and watchdog driver framework module complies with s3ip sysfs specification (#12892) (#13217) 2023-01-01 12:24:31 +08:00
Richard.Yu
515f798628
[202211][Submodule][SAI-Redis]Advance SAI Redis head pointer (#13158)
Why I did it
[202211][Submodule][SAI-Redis]Advance SAI Redis head pointer

How I did it
changes

sonic-net/sonic-sairedis@9a5c443
sonic-net/sonic-sairedis@99b789d
sonic-net/sonic-sairedis@9deef02
[202211][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1186 sonic-net/sonic-sairedis@a995edf
remove useless parameter --skip_error=-2, which remove from [202211][Submodule][SAI]Advance SAI head pointer sonic-sairedis#1186
How to verify it
local image build
2022-12-25 10:18:15 +08:00
Mai Bui
6759ad27b5 [device/ragile] Mitigation for security vulnerability (#11744)
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
The [xml.etree.ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree) module is not secure against maliciously constructed data.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
`subprocess.getstatusoutput` is dangerous because include shell=True in the implementation
#### How I did it
Remove xml. Use [lxml](https://pypi.org/project/lxml/) XML parsers package that prevent potentially malicious operation.
Replace `os` by `subprocess`
Use command as an array instead of string
Use `getstatusoutput_noshell` in `sonic_py_common` lib
2022-12-10 10:33:21 +08:00
Kebo Liu
28f8da80ea [Mellanox] Add support to Mellanox Spectrum-4 ASIC Firmware compiling and upgrade (#12844)
- Why I did it
Add support for compiling Spectrum-4 ASIC firmware to the SONiC image
Add support for Spectrum-4 ASIC firmware upgrade

- How I did it
Update Mellanox fw make files to include Spectrum-4 ASIC firmware binaries.
Update firmware upgrade scripts to be able to detect Spectrum-4 ASIC.

- How to verify it
Run regression tests

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-12-10 10:33:21 +08:00
Mai Bui
5238bd78af [ruijie] Replace os.system and remove subprocess with shell=True (#12107)
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [https://github.com/sonic-net/sonic-buildimage/pull/12065](https://github.com/sonic-net/sonic-buildimage/pull/12065)
#### Why I did it
1. `getstatusoutput` is used without a static string and it uses `shell=True`
2. `subprocess()` - when using with `shell=True` is dangerous. Using subprocess function without a static string can lead to command injection.
3. `os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content.
#### How I did it
1. use `getstatusoutput` without shell=True
2. `subprocess()` - use `shell=False` instead. use an array string. Ref: [https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation](https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation)
3. `os` - use with `subprocess`
2022-12-10 10:33:21 +08:00
Lior Avramov
f3821c6d2f [Mellanox] Add SDK hash calculator debian and update SDK makefile to compile it (#12840)
- Why I did it
Add SDK hash calculator Debian and update SDK makefile to compile it.

- How I did it
SDK hash calculator Debian will be used by ECMP calculator (PR #12482)

- How to verify it
Compile sonic-buildimage and verify SDK hash calculator Debian exist in target folder.
2022-12-10 10:33:21 +08:00
Mai Bui
4963c1cc97 [device/juniper] Mitigation for security vulnerability (#11838)
Signed-off-by: maipbui maibui@microsoft.com
Dependency: [https://github.com/sonic-net/sonic-buildimage/pull/12065](https://github.com/sonic-net/sonic-buildimage/pull/12065)
#### Why I did it
`commands` module is not secure
command injection in `getstatusoutput` being used without a static string
#### How I did it
Eliminate `commands` module, use `subprocess` module only
Convert Python 2 to Python 3
2022-12-10 10:33:21 +08:00
Stephen Sun
91e12d7b49 [Mellanox] Support PSU power threshold checking (#11863)
* Support power threshold

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* get_psu_power_warning_threshold => get_psu_power_warning_suppress_threshold

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Fix comments

Signed-off-by: Stephen Sun <stephens@nvidia.com>

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-12-10 10:33:21 +08:00