Commit Graph

104 Commits

Author SHA1 Message Date
Stephen Sun
97e6b4d15c Support shared headroom pool for Microsoft SKUs (#6366)
- Why I did it
Support shared headroom pool

Signed-off-by: Stephen Sun stephens@nvidia.com

- How I did it
Port configurations for SKUs based on 2700/3800 platform from 201911
For SN3800 platform:
C64: 32 100G down links and 32 100G up links.
D112C8: 112 50G down links and 8 100G up links.
D24C52: 24 50G down links, 20 100G down links, and 32 100G up links.
D28C50: 28 50G down links, 18 100G down links, and 32 100G up links.
For SN2700 platform:
D48C8: 48 50G down links and 8 100G up links
C32: 16 100G downlinks and 16 100G uplinks
Add configuration for Mellanox-SN4600C-D112C8
112 50G down links and 8 100G up links.

- How to verify it
Run regression test.
2021-02-23 23:56:01 +00:00
Joe LeVeque
78bf8159e8 [platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 15:48:30 -08:00
Junchao-Mellanox
ed0ac08e44 [Mellanox] PSU and module thermals are no longer child of chassis (#6460)
In order to build up device hierachy, PSU and module thermals are no longer child of chassis. PSU thermal belongs to PSU objects and SFP thermals belong to SFP object now. Need align this change in platform.json. Move thermal objects to correct parent device
2021-01-15 08:20:43 -08:00
Stephen Sun
e010d83fc3
[Dynamic buffer calc] Support dynamic buffer calculation (#6194)
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338)
- [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361)
- [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973)

**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
    - `dynamic` for the dynamic buffer calculation model
    - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
   - ASIC_TABLE in platform/\<vendor\>/asic_table.j2
   - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
   - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
   - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
   - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
   - fetch the vendor name when creates the docker and pass it as a docker environment variable
   - `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
   Update the buffer settings for dynamic buffer calculation
2020-12-13 11:35:39 -08:00
Junchao-Mellanox
51c77b179f
[Mellanox] Add python3 support for Mellanox platform API (#6175)
python2 is end of life and SONiC is going to support python3. This PR is going to support:

1. Mellanox SONiC platform API python3 support
2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side
2020-12-11 10:51:31 -08:00
Junchao-Mellanox
68464381bc
Add a configuration to delay start xcvrd for fast-reboot (#5643) 2020-12-02 21:28:18 +02:00
Joe LeVeque
7f4ab8fbd8
[sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926)
Submodule updates include the following commits:

* src/sonic-utilities 9dc58ea...f9eb739 (18):
  > Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260)
  > [generate_dump] Ignoring file/directory not found Errors (#1201)
  > Fixed porstat rate and util issues (#1140)
  > fix error: interface counters is mismatch after warm-reboot (#1099)
  > Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255)
  > [acl-loader] Make list sorting compliant with Python 3 (#1257)
  > Replace hard-coded fast-reboot with variable. And some typo corrections (#1254)
  > [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247)
  > Remove unnecessary conversions to list() and calls to dict.keys() (#1243)
  > Clean up LGTM alerts (#1239)
  > Add 'requests' as install dependency in setup.py (#1240)
  > Convert to Python 3 (#1128)
  > Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238)
  > [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233)
  > Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224)
  > [cli]: NAT show commands newline issue after migrated to Python3 (#1204)
  > [doc]: Update Command-Reference.md (#1231)
  > Added 'import sys' in feature.py file (#1232)

* src/sonic-py-swsssdk 9d9f0c6...1664be9 (2):
  > Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96)
  > FieldValueMap `contains`(`in`)  will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94)

- Also fix Python 3-related issues:
    - Use integer (floor) division in config_samples.py (sonic-config-engine)
    - Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform
    - Update all platform plugins to be compatible with both Python 2 and Python 3
    - Remove shebangs from plugins files which are not intended to be executable
    - Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict
    - Remove trailing whitespace from plugins files
2020-11-25 10:28:36 -08:00
Joe LeVeque
23247514f9
Fix a number of LGTM alerts (#5952)
Fix 259 alerts reported by the LGTM tool:

- 245 for Unused import
- 7 for Testing equality to None
- 5 for Duplicate key in dict literal
- 1 for Module is imported more than once
- 1 for Unused local variable
2020-11-20 10:58:48 -08:00
Kebo Liu
1158701edc
add pcied config files for mellanox platform (#5669)
This PR has a dependency on community change to move PCIe config files from $PLATFORM/plugin folder to $PLATFORM/ folder
- Why I did it
To support PCIed daemon on Mellanox platforms
- How I did it
Add PCIed config yaml files for all Mellanox platforms
Update pmon daemon config files for SimX platforms
2020-11-02 19:45:36 -08:00
Nazarii Hnydyn
5486f87afc
[Mellanox] Update platform components config files. (#5685)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-10-25 19:44:37 +02:00
shlomibitton
de1f7421ac
[Mellanox] Add sensors labels for human readable output for MSN2700 (#5661)
Add sensors labels for human readable output for MSN2700

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2020-10-19 16:02:27 -07:00
Junchao-Mellanox
1c97a03b81
[system-health] Add support for monitoring system health (#4835)
* system health first commit

* system health daemon first commit

* Finish healthd

* Changes due to lower layer logic change

* Get ASIC temperature from TEMPERATURE_INFO table

* Add system health make rule and service files

* fix bugs found during manual test

* Change make file to install system-health library to host

* Set system LED to blink on bootup time

* Caught exceptions in system health checker to make it more robust

* fix issue that fan/psu presence will always be true

* fix issue for external checker

* move system-health service to right after rc-local service

* Set system-health service start after database service

* Get system up time via /proc/uptime

* Provide more information in stat for CLI to use

* fix typo

* Set default category to External for external checker

* If external checker reported OK, save it to stat too

* Trim string for external checker output

* fix issue: PSU voltage check always return OK

* Add unit test cases for system health library

* Fix LGTM warnings

* fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon

* Remove boot_timeout configuration because it will get from monit config file

* Fix argument miss

* fix unit test failure

* fix issue: summary status is not correct

* Fix format issues found in code review

* rename th to threshold to make it clearer

* Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead

* Fix unit test failure

* Fix LGTM alert

* Fix LGTM alert

* Fix review comments

* Fix review comment

* 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker

* Ignore check for unknown service type

* Fix unit test issue

* Rename user define checker to user defined checker

* Rename user_define_checkers to user_defined_checkers for configuration file

* Renmae file user_define_checker.py -> user_defined_checker.py

* Fix typo

* Adjust import order for config.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/health_checker/hardware_checker.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/scripts/healthd

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import orders in src/system-health/tests/test_system_health.py

* Fix typo

* Add new line after import

* If system health configuration file not exist, healthd should exit

* Fix indent and enable pytest coverage

* Fix typo

* Fix typo

* Remove global logger and use log functions inherited from super class

* Change info level logger to notice level

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
2020-10-12 11:12:49 +03:00
vdahiya12
f2194010f8
[MSN2700] Add platform.json file containing platform hardware facts (#5189)
As part of Platform api testing for multiple platforms, this pull request adds a platform.json file which contains all
the static data for Mellanox-2700 platform. This file would provide all the platform specific data required for testing of all the Platform tests . As part of testing the API's the values of static/default objects within this specific platform file  will be compared  against the values returned by calling the Platform specific API's in a typical platform test
2020-09-22 17:12:51 -07:00
Nazarii Hnydyn
e2b4afc438
[Mellanox] Update platform components config files. (#5360)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-09-13 10:23:22 +03:00
Kebo Liu
72ec212fa7
[Mellanox] Refactor SFP related platform API and plugins with new SDK API (#5326)
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.

Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
2020-09-11 13:23:23 -07:00
Stephen Sun
54fcdbb380
Support single ingress pool for MSFT SKUs and optimize headroom calculation (#4686)
Calculate pool size in t1 as 24 * downlink port + 8 * uplink port

- Take both port and peer MTU into account when calculating headroom
- Worst case factor is decreased to 50%
- Mellanox-SN2700-C28D8 t0, assume 48 * 50G/5m + 8 * 100G/40m ports
- Mellanox-SN2700 (C32)
  - t0: 16 * 100G/5m + 16 * 100G/40m
  - t1: 16 * 100G/40m + 16 * 100G/300m

Signed-off-by: Stephen Sun <stephens@mellanox.com>

Co-authored-by: Stephen Sun <stephens@mellanox.com>
2020-08-13 13:12:09 +03:00
Joe LeVeque
3b89e5d467
[Python] Migrate applications/scripts to import sonic-py-common package (#5043)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999

Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
2020-08-03 11:43:12 -07:00
Stephen Sun
0db7e88313
[Mellanox] Update the buffer setting (#4989)
* Update the buffer size based on the latest excel

Signed-off-by: Stephen Sun <stephens@mellanox.com>

* Align the buffer configuration with the latest formula:

- reduce redundant "*2" in formula
- use port MTU for local sending the PFC frame and peer lossless MTU for peer sending lossless traffic

Buffer pool size updated accordingly.

Signed-off-by: Stephen Sun <stephens@mellanox.com>
2020-07-30 14:22:08 +03:00
Joe LeVeque
9905d9382d
[devices] Update SFP keys to align with new standard (#4975)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 13:03:50 -07:00
Junchao-Mellanox
e1f7fb135b
[Mellanox] Add system health configuration file for Mellanox platforms (#4834)
The new feature system health support a platform based configuration file. Add configuration files for all Mellanox platform.

Add a configuration file for SN2700, other platform will use a soft link to it.
2020-07-13 10:20:22 -07:00
Junchao-Mellanox
563a0fd21e
[Mellanox] Change port index in port_config.ini to 1-based (#4781)
* Change port index in port_config.ini to 1-based
* Add default port index to port_config.ini, change platform plugins to accept 1-based port index
* fix port index in sfp_event.py
2020-06-23 17:21:36 -07:00
madhanmellanox
2c830f4074
Modified SKU based utils to Platform based utils (#4786)
Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-06-21 12:15:23 -07:00
Junchao-Mellanox
e25c2d984f
[Mellanox] Never disable kernel thermal algorithm at real-time (#4638) 2020-05-26 10:46:29 -07:00
shlomibitton
b6291372d9
[Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4600c and new SKU ACS-MSN4600C (#4483)
* New SKU support for MSN4600C

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-04-30 00:30:11 -07:00
shlomibitton
ac6cfb115f
[Mellanox] Add a new Mellanox platform x86_64-mlnx_msn3420 and new SKU ACS-MSN3420 (#4436)
* New SKU support for MSN3420

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>

Conflicts:
	device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py

* Add CPLD's

* Symlink fixes and semantics

* Adding new platform at end of lines
2020-04-26 14:39:55 +03:00
Junchao-Mellanox
c730f3e207
[Mellanox] thermal control enhancement for dynamic minimum fan speed and PSU fan speed policy (#4403) 2020-04-21 08:09:53 -07:00
Kebo Liu
860cb265ac
[PMON] Extend pmon daemon start control to lm-sensors and fancontrol (#4447) 2020-04-21 08:00:48 -07:00
Kebo Liu
e04eb806b6
[Mellanox] bug fix - adpt sfputil plugin to support ACS-MSN4700 (#4361) 2020-04-14 10:27:27 -07:00
Mykola F
c5c1ae2173
[Mellanox] update eeprom.py plugin for SimX (#4364)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2020-04-03 12:41:37 -07:00
Junchao-Mellanox
be549db395
Add thermal control support for SONiC (#3949) 2020-03-09 10:41:10 -07:00
Joe LeVeque
f6ec95cf2b
[Mellanox]the port index in port_config.ini should starts from 0 (#4152) 2020-03-04 19:15:12 -08:00
Kebo Liu
7a9c8ee1fc
filter out CPU ports to avoid any operation on them (#4197) 2020-03-03 21:19:30 +02:00
Kebo Liu
4afb56da1d
Update SDK to 4.3.3052 (#4153)
update FW to xx_2000_3298
update SAI to 1.16.0

update Spectrum-1 and Spectrum-2 buffer pool size according to the new SDK default config change.

	modified:   ../../device/mellanox/x86_64-mlnx_msn2700-r0/ACS-MSN2700/buffers_defaults_t0.j2
	modified:   ../../device/mellanox/x86_64-mlnx_msn2700-r0/ACS-MSN2700/buffers_defaults_t1.j2
	modified:   ../../device/mellanox/x86_64-mlnx_msn3700-r0/ACS-MSN3700/buffers_defaults_t0.j2
	modified:   ../../device/mellanox/x86_64-mlnx_msn3700-r0/ACS-MSN3700/buffers_defaults_t1.j2
	modified:   fw.mk
	modified:   mlnx-sai.mk
	modified:   mlnx-sai/SAI-Implementation
	modified:   sdk-src/sx-kernel/Switch-SDK-drivers
	modified:   sdk.mk

signed-off by kebol@mellanox.com
2020-02-16 13:47:16 +02:00
Stephen Sun
031e69d574
[sfputil]fix an syntax error (#4141) 2020-02-13 19:04:58 +02:00
Nazarii Hnydyn
fc101b6ceb
[mellanox]: Add new Mellanox-SN3800-D112C8 sku. (#4085)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-01-30 18:54:09 -08:00
Stephen Sun
3239d7fc5b [Mellanox]Implement plugins for PSU, fan and thermal (#4041)
* [plugins]add fan functions, add voltage, current, power for psu
* [plugins]link fanutil.py and psuutil.py to those in 2700
* [plugin]add thermal
* [plugin]add symbol links for thermalutil for all SKUs
2020-01-24 11:27:32 -08:00
Nazarii Hnydyn
cb2edcf3df [mellanox] Add fwutil platform components. (#3999)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-01-24 11:26:17 -08:00
Dong Zhang
7aa0baf709 [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector (#4035)
* [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector
* update comment for a potential bug
* update comment
* add TODO maker as review reqirement
2020-01-22 11:26:23 -08:00
lguohan
483a5946a8
Revert "[MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928)" (#4002)
This reverts commit 0dae59ac30.
2020-01-10 08:27:34 -08:00
Dong Zhang
0dae59ac30 [MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928)
* [MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector
* fix vs tests along with swss vs tests together
2020-01-02 14:46:25 -08:00
Nazarii Hnydyn
51b78b5c33 [mellanox]: Enhance pmon synchronization with hw-mgmt platform counters. (#3885)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-12-17 10:58:55 -08:00
Wenda Ni
206df4327f
Adopt per-port buffer & qos profile apply on mellanox (#3543)
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-10-22 09:37:45 -07:00
Mykola F
124b26d72f [Mellanox] platform_reboot - sync & umount fs before power cycle (#3430)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-09-17 09:38:30 -07:00
Mykola F
2a7d8624cb [Mellanox] align platform reboot to use "hardware reboot" (#3321)
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
2019-08-19 10:25:39 -07:00
Kebo Liu
c7db1ec2e2 [Mellanox sfputil] update get_transceiver_change_event to support more event (#3261) 2019-08-07 09:30:21 -07:00
Nazarii Hnydyn
c6e442b946 [mellanox]: Added SN3800 platform (#3262)
* [mellanox]: Added SN3800 platform.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-08-01 16:25:44 -07:00
Stephen Sun
b9a806b38f [sfputil]Fix issue: xcvrd is broken. (#3258)
xcvrd is broken because a newly introduced interface get_transceiver_dom_threshold_info_dict in common code calls a unsupported interface _read_eeprom_specific_bytes.
Fix the issue by implement get_transceiver_dom_threshold_info_dict to avoid calling the unsupported interface.
2019-08-01 11:59:52 -07:00
Stephen Sun
f41e381c9a [Mellanox] fix the issue that failing to test whether dom supported prior to reading dom data (#3120)
* fix the issue Bug SW #1816356 which is due to failing to test whether dom supported prior to reading dom data

* use pre-defined variable to avoid magic number.
no need to read 16 bytes, 1 byte is enough since calibration and dom capability are all in bytes at offset 92
2019-07-06 14:51:16 -07:00
Stephen Sun
20e4547dbc [Mellanox] Fix typo "xSFP_VLOT_OFFSET" (#3118)
Variables SFP_VLOT_OFFSET and QSFP_VLOT_OFFSET containing the typo are originally defined in repo sonic-platform-common. The typo has been fixed in PR #33. However, some Mellanox-specific code hasn't updated correspondingly, which results in xcvrd fail to start.
This PR updates the variable name in Mellanox-specific code correspondingly to fix that.
2019-07-05 14:06:18 -07:00
Stephen
bf2c9cd099 [Mellanox]Remove the dependency on sysfs for sfputil (#2967)
* [sfputil]Remove the dependency on sysfs for sfputil, mainly get_presence and port_to_eeprom_mapping
Remove the dependency on sysfs, including:
1. rewrite get_presence by using ethtool;
2. remove interface port_to_eeprom_mapping which is no longer referenced;
3. remove code that references port_to_eeprom_mapping and _port_to_eeprom_mapping;
4. remove private member qsfp_sysfs_path which is no longer referenced.

* [sfputil.py]
minor adjustment: move the presence=False to the beginning of get_presence.
2019-06-07 06:21:03 -07:00