Commit Graph

1447 Commits

Author SHA1 Message Date
noaOrMlnx
be7f31e6d2 [CoPP] Add always_enabled field (#9302)
*Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.
2022-03-01 03:40:40 +00:00
Mahesh Maddikayala
87d18eb6c3
[libsaibcm] Fix the expiry date on the libsaibcm debian packages. (#10090) 2022-02-28 08:52:11 -08:00
Mahesh Maddikayala
2153ea41a5
[libsaibcm] Update libsaibcm with PFC patches (#10066) 2022-02-23 14:24:10 -08:00
xumia
93a22d7df5 [Build]: Fix marvell sai package version parsing issue
Fix marvell sai package version parsing issue (#10009)
2022-02-19 04:23:29 +00:00
Volodymyr Samotiy
97bd2bf82f
[Mellanox][202106] Update SAI to 1.20.2 and SDK/FW to 4.5.1208/2010.1218 (#9618)
- Why I did it
To include latest fixes.
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2022-01-26 10:59:39 +02:00
Junchao-Mellanox
c0f0694236
[Mellanox] [202106] Optimize thermal policies (#9451)
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.

- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable

- How to verify it
1. Manual test
2. Regression
2022-01-19 11:43:55 +02:00
Stepan Blyshchak
559ce14277
[nvidia] fail the build when hw-mgmt patches do not apply (#9565)
- Why I did it
To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply.

- How I did it
Removed obsolete patch, made applying patches a hard failure in the build.

- How to verify it
Run the make and verify patches are applied.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-01-16 08:10:25 +02:00
Stepan Blyshchak
4ebafdaf28 [Mellanox][SDK] Build SDK with PRM sniffer support (#9500)
- Why I did it
To have an ability to use PRM sniffer.

- How I did it
Enabled the option in configure flags.

- How to verify it
Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-12-22 09:05:49 -08:00
Junchao-Mellanox
73a17d8876
[Mellanox] [202106] Fix issue: SFP might not be re-initialized after cable plug in (#9387)
- Why I did it
Fix issue: SFP might not be re-initialized after cable plug in. There could be case like:
1. The SFP object was initialized as QSFP type
2. User use adapter to connect a SFP to a QSFP port
3. The SFP object treat the SFP as QSFP and failed to process EEPROM

- How I did it
If a new SFP is plugged, always re-initialize the SFP

- How to verify it
Added unit test case
2021-12-14 16:27:26 +02:00
Junchao-Mellanox
3ae3eeda35
[Mellanox] Allow user to set LED to orange (#9259) (#9515)
Backport #9259 to 202106

- Why I did it
Nvidia platform API does not support set LED to orange.

- How I did it
Allow user to set LED to orange

- How to verify it
Manual test and unit testing
2021-12-14 13:33:23 +02:00
Stephen Sun
5c91f233ef
[Reclaim buffer][202106] Reclaim unused buffers by applying zero buffer profiles (#9062)
This is to backport community PR #8768 to 202106 branch

Why I did it
Support zero buffer profiles
Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file

Signed-off-by: Stephen Sun stephens@nvidia.com

How I did it
Add buffer profiles and pool definition for zero buffer profiles

If the buffer model is static:
 - Apply normal buffer profiles to admin-up ports
 - Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
 - Apply normal buffer profiles to all ports
 - buffer manager will take care when a port is shut down
 - Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists

Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:

The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:

One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files

Update sonic-cfggen test accordingly:
 - Adjust example output file of JSON template for unit test
 - Add unit test in for Mellanox's new buffer templates.

How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
2021-12-13 10:51:50 -08:00
Samuel Angebault
08c2c07fc0
[202106][Arista] Update arista platform library (#9483) 2021-12-09 18:29:59 -08:00
Volodymyr Samotiy
b22db5a52b
[Mellanox] [202106] Update SAI to v1.20.0.1 and SDK/FW to v4.5.1156/v2010.1152 (#9431)
- Why I did it
To include latest fixes.
SAI
* Reduce verbosity of warning message on shared memory already existing
* accuflow allocation support by key value

SDK
* Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
* In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
* Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
* When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
* When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
* Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
* Tying the SCL and SDA of the optical modules to 3.3V causes errors.
* On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
* While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
* In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. 
* The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
* When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
* Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
* Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. 
* Fixed a memory leak in sx_api_port_sflow_statistics_get API. 
* During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-12-06 21:54:14 +02:00
Mahesh Maddikayala
3f3dceb96a
[broadcom]: update bcm dnx gpl module pointer (#9442)
saibcm_modules_dnx submodule update contains a fix for kernel crash
2021-12-03 21:21:07 -08:00
Junchao-Mellanox
e4ff4d2e3a [Mellanox] Fan speed should not be 100% when PSU is powered off (#9258)
- Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.

- How I did it
When PSU is powered of, don't treat it as absent.

- How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
2021-12-01 09:47:26 -08:00
Stephen Sun
fa0ae42e69 [Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-12-01 09:47:18 -08:00
dflynn-Nokia
33fce6afd1 [Nokia ixs7215] Platform API fixes (#9025)
* [Nokia ixs7215] Platform API fixes

This commit delivers the following fixes
    - Fix bug preventing access to second PSU eeprom
    - Fix bug preventing updates to front panel PSU status led
    - Fix SFP reset test case failure

* Fix LGTM alert
2021-11-14 15:16:14 -08:00
Volodymyr Samotiy
badce1cbf6
[202106] [Mellanox] Update hw-mgmt to v7.0010.3330 (#9164)
* Changed Debian package dependency in order to support both python or python3 packages
* Fix Python scripts to be compatible with python2.7/python3 versions
* hw-mgmt: attributes: Fix PSU power sensor attributes capability

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-11-05 18:42:09 -07:00
Stepan Blyshchak
234b5b64e4 [dockers] change RPC, DBG dockers version: put RPG, DBG sign in build metadata part of the version (#8920)
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.

- How I did it
Changed the way how to version in RPC and DBG images are constructed.

- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-02 22:54:30 -07:00
Junchao-Mellanox
48d412d49f
[Mellanox] Fix issue: PSU model/serial/revision info should be updated after replacing PSU (#9040) 2021-11-01 09:57:09 +05:30
judyjoseph
b125f5d564
[202106] Advance the broadcom SAI to 5.0.0.11 (#9095)
* Advance the broadcom SAI to 5.0.0.11
* Update saibcm-modules-dnx to take in Knet MTU fix
2021-10-28 11:11:58 -07:00
Junchao-Mellanox
2a3738ead5
[Mellanox] Add a trigger to set LED to blink (#8995)
Depends on Mellanox hw-mgmt 7.0010.3300

Why I did it
Adjust LED logical according to hw-mgmt change.

How I did it
Add a trigger to set LED to blink.

How to verify it
Manual test
2021-10-28 10:01:40 -07:00
DavidZagury
a68a3a176e [Mellanox] Upgrade Mellanox firmware tools to 4.17.2-12 (#8978)
- Why I did it
Bug fix:
bad_param request due to missing parser rest command while running mlxlink

- How I did it
Advance to MFT tool version to 4.17.2-12.

- How to verify it
Manually tested on all mellanox platforms.
2021-10-26 08:58:50 -07:00
Alexander Allen
978584b3d1
[202106] [Mellanox] Upgrade hw-mgmt to V.7.0010.3300 (#9020)
- Why I did it
Upgrade to the latest version of hardware management in order to incorporate the latest bugfixes and drivers in the kernel.

- How I did it
Updated the version number and submodule for hw-mgmt.

- How to verify it
This has been verified on all Mellanox platforms through a combination of sonic-mgmt tests and other internal verification.
2021-10-26 13:32:28 +03:00
Vadym Hlushko
315922746d [fan] Fixed dynamic minimum fan speed table for SN4410 (#8960)
Put new values to the dynamic minimum fan speed table for SN4410.
2021-10-20 18:14:29 -07:00
Alexander Allen
95fd2f7534 [mellanox] Remove validation for fw filenames with no extension (#8956)
Why I did it
Currently the mellanox platform API is validating the file extensions of firmware packages to be installed for basic sanity checking. However, ONIE packages do not have an extension and as such if there is a "." in the name it is taken to be an extension and then fails the sanity check.

How I did it
I removed the check which ensures that ONIE images don't have a file extension.

How to verify it
Name the ONIE updater file 2021.onie and attempt to install it via fwutil install fw 2021.onie --yes
2021-10-20 18:14:13 -07:00
Alexander Allen
3e8a612e11 [platform] [mellanox] Use correct API call to update firmware in auto_update_firmware (#8961)
Why I did it
The fwutil update all utility expects the auto_update_firmware method in the Platform API to execute the update_firmware() call and not the install_firmware() call.

How I did it
Changed the method in the mellanox platform API component implementation.

How to verify it
Run fwutil update all with a CPLD update on a Mellanox platform and verify that it properly updates the firmware using the MPFA file.
2021-10-20 18:14:03 -07:00
Aravind Mani
e3cb49f859 DellEMC: Fix z9332f low power mode issue (#8693) 2021-10-14 15:29:46 -07:00
lguohan
a76dcdf130 [build]: add branch and release name in sonic_version.yml (#6356)
the branch refers the branch name that the commit is in,
for example master, 202012, 201911, ...
In case there is no branch, the name will be HEAD.

release is encoded in /etc/sonic/sonic_release file.
the file is only available for a release branch.
It is not available in master branch.

example for master branch
```
build_version: 'master.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: 'master'
release: 'none'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```

example for 202012 release branch
```
build_version: '202012.602-6efc0a88'
debian_version: '10.7'
kernel_version: '4.19.0-9-2-amd64'
asic_type: vs
commit_id: '6efc0a88'
branch: '202012'
release: '202012'
build_date: Tue Dec 29 06:54:02 UTC 2020
build_number: 602
built_by: johnar@jenkins-worker-23
```

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-10-14 15:27:41 -07:00
Volodymyr Samotiy
d3dad3f99c
[202106][Mellanox] Update SAI to v1.19.7.1 (#8930)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-10-13 14:08:54 -07:00
Junchao-Mellanox
821f521ae3 [Mellanox] Change thermal recover threshold from temp_trip_norm to temp_trip_high (#8792)
- Why I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high, so that thermal algorithm would set fan speed to minimum allowed earlier and save power.

- How I did it
Change thermal recover threshold from temp_trip_norm to temp_trip_high

- How to verify it
Manual test
2021-10-04 19:28:15 -07:00
Aravind Mani
17ccd8babe DellEMC: Z9332f fix platform bugs (#8777)
* DellEMC: Z9332f fix platform bugs

* update sfp.py
2021-09-26 21:36:36 -07:00
dflynn-Nokia
25b44c0ca6 [Nokia ixs7215] Support show system-health (#8771)
* [Nokia ixs7215] Support show system-health
* [Nokia ixs7215] Fix LGTM alert
2021-09-26 21:36:26 -07:00
dflynn-Nokia
35312edebc [Nokia ixs7215] Add support for SFP eeprom type_abbrv_name attribute (#8772) 2021-09-26 21:36:14 -07:00
Sudharsan Dhamal Gopalarathnam
d8841cb876 [DPB]Removing default admin status initialization in DPB flow while loading minigraph (#8711)
To Fix #8697 . The config load_minigraph initializes 'admin_status' to up when platform.json has DPB configs. This doesn't happen when using port_config.ini
The update minigraph has logic to initialize only the ports whose neighbors are defined or those belonging to portchannel
However, a change was introduced to have default admin status to be 'up' in portconfig.py when the minigraph was using platform.json

This will lead to sanity check failure in sonic-mgmt and thus no test cases could be run
2021-09-14 09:58:30 -07:00
dflynn-Nokia
e3cfc44354 [Nokia ixs7215] Miscellaneous platform API fixes (#8707)
* [Nokia ixs7215] Miscellaneous platform API fixes

This commit delivers the following fixes for the Nokia ixs7215 platform

- Fix bug in a fan API error path
- Add support for setting the fan drawer led
- Add support for getting/setting the front panel PSU status led
- Add support for getting the min/max observed temperature value

* [Nokia ixs7215] code review changes: temperature min/max values
2021-09-14 09:58:20 -07:00
Nazarii Hnydyn
e3a827a14d [Mellanox] Advance hw-mgmt to V.7.0010.2346. (#8667)
Commits on Sep 01, 2021
hw-mgmt: attributes: Add PSU power sensor attributes d8fce39

Commits on Sep 02, 2021
Remove MFT package flint tool from hw-management dump generation. 53d06b2
hw-mgmt: debug: Add timeout to generate-dump.sh b661fa3 

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-09-14 09:36:49 -07:00
richardyu
0f1d58a0c9 [SAIServer] sai server reads config from hwsku folder (#8625)
To enable saiserver docker on different platforms, it needs different configuration files. make the saiserver docker mount them in hwsku folder.

Co-authored-by: Ubuntu <richardyu@richardyu-ubuntu-vm0.trsxrdzozv2e1czsze2t05vqzh.ix.internal.cloudapp.net>
2021-09-02 15:46:23 -07:00
carl-nokia
75ef115e8e [Nokia ixs7215] sfp get_name test case fix (#8507)
Account for sfputil_helper indexing being 0 based

Co-authored-by: Carl Keene <keene@nokia.com>
2021-09-02 15:46:12 -07:00
Junchao-Mellanox
e925339a7e [Mellanox] Read PSU fan max/min speed per PSU (#8563)
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU

#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file

#### How to verify it
Manual test
2021-09-02 15:45:26 -07:00
richardyu
1808d5894e [202012][saiserver docker]adds saiserver dependences (#8447)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-09-02 15:38:17 -07:00
dflynn-Nokia
7a950cf49c [Nokia ixs7215] Add support for changing the console baud rate (#8595)
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.

A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
2021-09-02 15:37:54 -07:00
Samuel Angebault
6c6bfde24e
[202106][Arista] Update platform library submodules (#8643)
Update 202106 release with the Arista drivers from master.
This update is mostly targeted at chassis and is deemed stable.

Most notable chassis improvements:
 - fix psu reporting in platform api
 - powercycle fabrics on supervisor reboot
 - improve card powercycle reliability
 - fix led plugin when dealing with `Ethernet-Rec` and `Ethernet-IB`
 - fix system-health cli reporting
 - unset provisioning mode once the linecard has started
 - fix `show version` when running as `admin`
 - fix race between loading the eeprom module and sysfs file availablity
 - implement `get_all_asics` platform API
2021-09-01 23:23:11 -07:00
Alexander Allen
9a3e35ec83 [Mellanox] Upgrade Mellanox firmware tools to 4.17.0 (#8299)
- Why I did it
New release of MFT has the following changelog / RN
 Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot.
 Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation

- How I did it
Update makefile pointer to new version.

- How to verify it
Manually tested on all Mellanox platforms.
2021-08-25 12:37:36 -07:00
Marty Y. Lok
f33b659aff Added Nokia IXR7250E support (#7809)
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards

How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
2021-08-25 12:21:39 -07:00
gechiang
c679ebf931 Reapply the fix to address setting MTU > 1500 causing portmgrd crash on BRCM platforms (#8472) 2021-08-25 12:17:56 -07:00
Kebo Liu
0108c7de58 [Mellanox] Upgrade hw-mgmt to 7.0100.2344 (#8463)
To pick up new PSU fan support from new hw-mgmt release

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-08-25 12:17:36 -07:00
carl-nokia
aef7c85695 [Nokia ixs7215] sfputil support + component tests (#8445)
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:17:07 -07:00
jerseyang
418ee0cd38 enable the emc2305 fan controller and NCP power controller 30ms timeout mechanism (#8138)
Why I did it
fix the dx010 system eeprom unavailable issue

How I did it
enable the i2c slave 30ms timeout mechanism

How to verify it
i2cstress test in DX010 iSMT controller bus

Co-authored-by: nicwu-cel <nicwu@celestica.com>
2021-08-25 12:14:59 -07:00
dflynn-Nokia
fb072d84cb [Nokia ixs7215] Watchdog timer support (#8377) 2021-08-25 12:13:44 -07:00