Commit Graph

308 Commits

Author SHA1 Message Date
Eran Dahan
9c9f0453f9
[MLNX] update SAI submodule (#6666)
** Why I did it **
Disable SDK extended dump due to issue found

** How I did it ** 
Update SAI submodule

** How to verify it **
Verify the SDK extended dump is not called.

Signed-off-by: Eran Dahan <erand@nvidia.com>
2021-02-04 09:03:51 +02:00
lguohan
22a19e87aa [build]: wait for conflicts package to be uninstalled (#5039)
when parallel build is enabled, both docker-fpm-frr and docker-syncd-brcm
is built at the same time, docker-fpm-frr requires swss which requires to
install libsaivs-dev. docker-syncd-brcm requires syncd package which requires
to install libsaibcm-dev.

since libsaivs-dev and libsaibcm-dev install the sai header in the same
location, these two packages cannot be installed at the same time. Therefore,
we need to serialize the build between these two packages. Simply uninstall
the conflict package is not enough to solve this issue. The correct solution
is to have one package wait for another package to be uninstalled.

For example, if syncd is built first, then it will install libsaibcm-dev.
Meanwhile, if the swss build job starts and tries to install libsaivs-dev,
it will first try to query if libsaibcm-dev is installed or not. if it is
installed, then it will wait until libsaibcm-dev is uninstalled. After syncd
job is finished, it will uninstall libsaibcm-dev and swss build job will be
unblocked.

To solve this issue, _UNINSTALLS is introduced to uninstall a package that
is no longer needed and to allow blocked job to continue.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 14:07:30 -08:00
lguohan
8bcdefbc34 [docker-orchagent]: make build depends only on sairedis package (#6467)
backport c4b5b002c3

make swss build depends only on libsairedis instead of syncd. This allows to build swss without depending
on vendor sai library.

Currently, libsairedis build also buils syncd which requires vendor SAI lib. This makes difficult to build
swss docker in buster while still keeping syncd docker in stretch, as swss requires libsairedis which also
build syncd and requires vendor to provide SAI for buster. As swss docker does not really contain syncd
binary, so it is not necessary to build syncd for swss docker.

[submodule]: update sonic-sairedis
1e42517996bfe41ac58d4c25ee3f93502befcb9d (HEAD -> 201911) [build]: add option to build without syncd

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 13:51:24 -08:00
Kebo Liu
35d93ff8a3
[201911][Mellanox] Add hw-mgmt patch to support SDK OFFLINE event handling during ISSU (#6551)
In order to prevent "mlxsw_minimal" driver accessing ASIC during in
service firmware upgrade flow, SDK will raise "OFFLINE" 'udev' event
at early beginning of such flow. When this event is received,
hw-managemnet will remove "mlxsw_minimal" driver.
There is no need to implement opposite "ONLINE" event, since this flow
is ended up with "kexec".

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-26 16:49:13 -08:00
Kebo Liu
687e1b9931
[mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6553)
Bugs fixes:
    All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
    All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
    Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
    Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.

Enahncments:
    All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-25 20:10:15 -08:00
lguohan
a90eac73bf [mellanox]: fix mellanox hw-management build (#6471)
use dpkg-buildpackage build with fakeroot

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-25 12:44:50 -08:00
Kebo Liu
dea38d1558
Update Mellanox SDK to 4.4.2208 FW to *.2008.2208 (#6342) 2021-01-04 14:10:37 +02:00
shlomibitton
6d38654034
[Mellanox] PSU led platform API fixes (#6214)
- Why I did it
Fix setting PSU led to 'green' or 'red' states.
Fix return False if unsupported color request.
Remove 'off' option for PSU led API since it is not supported in Mellanox.

- How I did it
Fix import missing information.
Return 'False' when unsupported led color is requested, preventing an exception.

- How to verify it
Try to set PSU LED to different status with Mellanox platform device.
Try to set PSU LED color to unsupported color with Mellanox platform device.
2020-12-24 01:11:48 -08:00
Volodymyr Samotiy
78c44d1808
[Mellanox] Update SAI submodule (#6235)
To add VNET route diff tool (SAI/SDK part) to 201911 release

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2020-12-17 09:11:50 -08:00
Volodymyr Samotiy
39e1c27525
update SDK to 4.4.2112, FW to *.2008.2112, SAI to 1.18.0.1 (#6147)
Co-authored-by: keboliu <kebol@mellanox.com>
2020-12-08 07:54:50 +02:00
Junchao-Mellanox
8f45bfa1be [Mellanox] Remove eeprom cache file when first time init eeprom object (#6071)
EEPROM cache file is not refreshed after install a new ONIE version even if the eeprom data is updated. The current Eeprom class always try to read from the cache file when the file exists. The PR is aimed to fix it.
2020-12-04 13:26:23 -08:00
Abhishek Dosi
8c0df39c96 Revert "Advance SDK/SAI (#6004)"
This reverts commit 33a6e56833.
2020-11-26 11:55:52 -08:00
Junchao-Mellanox
37eb088b74
[Mellanox] [201911] Fix issue: set fan led in certain order causes incorrect physical fan led color (#6019)
* Fix issue: fan led colo status

* Fix LGTM warning

* Support fan led management for non-swapable fan
2020-11-26 10:09:48 +02:00
Stephen Sun
33a6e56833
Advance SDK/SAI (#6004)
SDK 4.4.2018
FW XX_2008_2018
SAI 1.17.9

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2020-11-26 09:43:50 +02:00
Junchao-Mellanox
ebc84bee94
Fix issue: fan.get_presence always return false (#5983) 2020-11-23 09:28:12 +02:00
Junchao-Mellanox
500395c56e
[Mellanox] Support max/min speed for PSU fan (#5682) (#5801)
As new hw-mgmt expose the sysfs for PSU fan max speed, we need support max/min speed for PSU fan in mellanox platform API.
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/fan.py
2020-11-17 18:17:37 +02:00
shlomibitton
1b10f86554 [Mellanox] Fix for QSFP-DD channel status (#5900)
Wrong object init broke the API. Replace object to the correct type.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-14 12:26:28 -08:00
shlomibitton
4088872bb5 [Mellanox] Enhance QSFP-DD DOM information (#5776)
New driver support fetching additional pages from the cable EEPROM.
There are additional information to parse now: RX/TX power, TX bias, TX fault and RX LOS.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-11-14 12:25:57 -08:00
Nazarii Hnydyn
781abed79e
[Mellanox] Update SAI to v.1.17.7. (#5766)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-11-02 10:51:49 +02:00
Junchao-Mellanox
712d97f911
[Mellanox] Update SDK 4.4.1956 and FW *.2008.1956 for 201911 (#5769)
Update SDK 4.4.1956 and FW *.2008.1956

Bugs fixes:

1.	Link | Clear operational speed when link is not active
2.	Spectrum-2, SN3800 | On rare occasion, link flapping due to bad BER causes traffic loss
3.	Spectrum-3 | On rare occasion, link flapping due to bad BER causes traffic loss as a result of new PAM4 link maintenance flow on Spectrum-3 devices
4.	Shared Buffers | On rare occasion, modifying shared buffers on a system with split port while traffic is running may cause the firmware to get stuck
5.	Spectrum-3, SN4700 | Fence may fail while running 400GbE 8x port when modifying mirror session configurations under traffic
2020-11-01 23:20:27 -08:00
abdosi
0fad6bdc7f [monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-11-01 10:27:10 -08:00
Junchao-Mellanox
06b5ad02ac [Mellanox] Re-initialize SFP object when detecting a new SFP insertion (#5695)
When detecting a new SFP insertion, read its SFP type and DOM capability from EEPROM again.

SFP object will be initialized to a certain type even if no SFP present. A case could be:

1. A SFP object is initialized to QSFP type by default when there is no SFP present
2. User insert a SFP with an adapter to this QSFP port
3. The SFP object fail to read EEPROM because it still treats itself as QSFP.

This PR fixes this issue.
2020-10-30 09:04:26 -07:00
Stepan Blyshchak
f7d753fd70 [Mellanox] Configure SAI to log to syslog instead of stdout. (#5634)
Example of syslog message from Mellanox SAI:

"Oct  7 15:39:11.482315 arc-switch1025 INFO syncd#supervisord: syncd Oct 07 15:39:11 NOTICE  SAI_BUFFER: mlnx_sai_buffer.c[3893]- mlnx_clear_buffer_pool_stats: Clear pool stats pool id:1"

There is a log INFO from supervisord which actually printed NOTICE and
date again. This confusion happens becuase if SAI is not built to log
to syslog it will log everything to stdout with format "[date] [level]
[message]" so supervisord sends it to syslog with level INFO.

New logs look like:

"Oct  7 15:40:21.488055 arc-switch1025 NOTICE syncd#SDK  [SAI_BUFFER]: mlnx_sai_buffer.c[3893]- mlnx_clear_buffer_pool_stats: Clear pool stats pool id:17"

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2020-10-30 08:57:21 -07:00
Junchao-Mellanox
ea28f2dcb2 [Mellanox] Fix issue: read data from eeprom should trim tail \0 (#5670)
Now we are reading base mac, product name from eeprom data, and the data read from eeprom contains multiple "\0" characters at the end, need trim them to make the string clean and display correct.
2020-10-22 10:54:02 -07:00
Kebo Liu
ae3f09246c [Mellanox] Optimize SFP Platform API implementation (#5476)
Each SFP object inside Chassis will open an SDK client, this is not necessary and SDK client can be shared between SFP objects.
2020-10-21 08:13:59 -07:00
Nazarii Hnydyn
bd61e3811b
[Mellanox] Update SDK 4.4.1912, FW XX.2008.1912 (#5575)
- SN3800 vs Cisco9236 - no link copper or optics - start sending IDLE before PHY_UP for specific OPNs

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-10-11 15:34:05 -07:00
Kebo Liu
ae3d458d48 [Mellanox] Refactor platform API to remove dependency on database (#5468)
**- Why I did it**
- Platform API implementation using sonic-cfggen to get platform name and SKU name, which will fail when the database is not available.
- Chassis name is not correctly assigned, it shall be assigned with EEPROM TLV "Product Name", instead of SKU name
- Chassis model is not implemented, it shall be assigned with EEPROM TLV "Part Number"

**- How I did it**

1. Chassis

> - Get platform name from /host/machine.conf
> - Remove get SKU name with sonic-cfggen
> - Get Chassis name and model from EEPROM TLV "Product Name" and "Part Number"
> - Add function to return model

2. EEPROM

> - Add function to return product name and part number

3. Platform

> - Init EEPROM on the host side, so also can get the Chassis name model from EEPROM on the host side.
2020-10-06 11:30:16 -07:00
Kebo Liu
8c3dbce209 [Mellanox] Fix truncated manufacture date returned from platform API (#5473)
The manufacture date returned from platform API was truncated, time is not included. Revise the regular expression used for matching.
2020-10-06 11:22:08 -07:00
Junchao-Mellanox
de3045188d [Mellanox] Update dynamic minimum table for 4700, 3420 and 4600C (#5388)
Update dynamic minimum fan speed table according to data provided by thermal team.
2020-10-06 06:04:06 +00:00
Junchao-Mellanox
e614697967
Fix issue: there should be 2 cpu core temperature sensors in 3420 (#5402) 2020-10-04 16:58:17 +03:00
Stephen Sun
1da60a6811
Integrate sdk and fw 4.4.1910 (#5495)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
2020-10-01 17:22:05 +03:00
Kebo Liu
46e57e050c [Mellanox] Refactor SFP related platform API and plugins with new SDK API (#5326)
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.

Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
2020-09-29 15:39:52 +00:00
Joe LeVeque
b70c6f72b2 [dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-28 16:12:53 +00:00
yozhao101
7580c846ad
[201911][Monit] Unmonitor processes in disabled containers (#5462)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

- Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:30:41 -07:00
noaOrMlnx
75068f3a62
[Mellanox] Update SAI version to 1.17.3 in 201911 branch (#5376)
Signed-off-by: Noa Or <noaor@nvidia.com>
2020-09-15 20:26:57 +03:00
Volodymyr Samotiy
68d054e925
[Mellanox] Update SDK 4.4.1622, FW xx.2008.1622 (#5299)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2020-09-03 15:03:52 -07:00
Mykola F
c243b8a9f5
[201911] Update SAI-Implementation submodule and enable port in/out dropped pkts stats (#5093)
- Enable port buffer drops by default
- Update SAI submodule

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2020-08-25 08:20:05 -07:00
Joe LeVeque
9048d7ae4d
[201911] Remove sonic-daemon-base package (#5181)
sonic-daemon-base package has been deprecated in favor of the sonic-py-common package. All related functionality has been moved there.

This is a backport of https://github.com/Azure/sonic-buildimage/pull/5131 and parts of https://github.com/Azure/sonic-buildimage/pull/5168 to the 201911 branch
2020-08-22 17:55:27 -07:00
Guohan Lu
b52b9c12cf [docker-syncd-mlnx]: use service dependency in supervisord to start services 2020-08-15 22:31:32 -07:00
Kebo Liu
7140055d73 [Mellanox] Update the sfp platform API to get the ext_specification_compliance with new way (#5123)
Update the platform API implementation with calling dedicated parse function which defined in the platform-common as defined by https://github.com/Azure/sonic-platform-common/pull/112
2020-08-13 23:03:03 -07:00
Joe LeVeque
309a098b21
[201911][Python] Migrate applications/scripts to import sonic-py-common package (#5132)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added to the 201911 branch in https://github.com/Azure/sonic-buildimage/pull/5063
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is a step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
2020-08-13 16:35:53 -07:00
Kebo Liu
5d1065dddc
[Mellanox] Update SDK to 4.4.1306, FW to *.2008.1310 (#5124)
* Update SDK/FW version number in the make file

* update Switch-SDK-drivers submodule
2020-08-11 10:05:47 +03:00
shlomibitton
fad242480c Add support for 'Extended Specification Compliance' for QSFP cables parser (#5096)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-08-09 10:44:14 -07:00
Joe LeVeque
6556c40040
[201911] Introduce sonic-py-common package (#5063)
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.

The package currently includes four modules:
- daemon_base
- device_info
- logger
- task_base

NOTE: This is a combination of all changes from https://github.com/Azure/sonic-buildimage/pull/5003, https://github.com/Azure/sonic-buildimage/pull/5049 and some changes from https://github.com/Azure/sonic-buildimage/pull/5043 backported to align with the 201911 branch. As part of the 201911 port, I am not installing the Python 3 package in the base image or in the VS container, because we do not have pip3 installed, and we do not intend to migrate to Python 3 in 201911.
2020-08-03 11:50:06 -07:00
Nazarii Hnydyn
4e558bca25
[201911][Mellanox] Update MFT to 4.15.0-104 (#5077)
* [Mellanox] Update MFT to 4.15.0-104.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Remove build system W/A.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Add MFT DKMS build support.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-08-03 13:53:33 +03:00
Kebo Liu
c8c4493a96
Update SAI to 1.16.6, SDK to 4.4.1014, FW to *.2008.1032 (#5056)
SAI:
    Fix ECMP max groups logic
    add set issu log level for spc2/spc3, as now issu is supported
    set vlan max swid = 0 on sdk init, as only single swid is needed, for efficient resource usage
    Fix traffic lost during FFB related to buffer config + optimize buffer config timing for FB
    Add ACL fields BTH, IP flags
    Add ACL infrastructure of different fields per ASIC type
    Add port stat ether rx/tx oversize pkts
  SDK/FW:
    Added support for Finisar 100GbE SWDM Transceiver FTLC9152RGPL.
    Spectrum-2 Added support for 10G BaseT modules
    Added link LED support for SN4600C.
    Counters | In SDK debug dump, the incorrect counter type appears for vtraps.
    WJH | Without any traffic or events on the idle system, the CPU load is constantly above 4%
    WJH | WJH filter currently cannot filter by PORT for buffer drop reason.
    Spectrum | ACL, Unbind, Lazy Delete | Running Lazy Delete together with auto_unbind may cause rate condition errors. To work work with Lazy Delete use new INIT parameter "acl_manual_unbind" so that ACLs will notbe removed automatically when binding point is deleted.
    Spectrum | ISSU | In ISSU mode, when querying for the number of configurable buffers, using the API sx_api_cos_port_buff_type_get with the count parameter as 0, the API returns the number for NORMAL mode instead.
    Spectrum-2 | BER | BER monitor counts raw errors instead of effective errors
    Spectrum-2 | BER | Connecting to ConnectX-5 adapter card with copper splitter cable MCP7H50-V001R30 in 1
    Spectrum-2 | Cables | Link flaps in 200GbE with AOM Optic cable MMA1T00-VS
    Spectrum-3 | Speeds, Link | When moving from a 400GbE link to a 1GbE link, packets may drop for 1msec right after link up
    Spectrum-3 | Cables, Speeds | Using 400GbE with 3rd party systems is not supported
    Spectrum-3 | LAG | After a while, LAG members become out of sync with one another
    Spectrum-3 | VLAN, Ports | Packets with VLAN headers are sent to
2020-07-30 13:37:54 +03:00
shlomibitton
9385775803 [Mellanox] Change fan tolerance to 50% (#5018)
Mellanox platforms fan tolerance should change to 50%

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-26 11:18:00 -07:00
shlomibitton
d0be3ebe16 Add support for QSFP-DD cables on MLNX platform API (#4965)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-26 11:15:05 -07:00
Nazarii Hnydyn
0ec979dd30
[Mellanox] Fix SN3700 platform string. (#5035)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-25 03:04:03 -07:00
Joe LeVeque
840be7732c
[201911][devices] Update SFP keys to align with new standard (#4976)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 11:09:47 -07:00