Commit Graph

913 Commits

Author SHA1 Message Date
Neetha John
43aec133da
[202012] [qos] Update RDMA-CENTRIC lossy profile to use static threshold for Th devices (#14398)
Backport #14372 to 202012

Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles

How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-24 10:41:48 -07:00
Neetha John
94f9942ef6 Update dynamic threshold for TD2 (#14224)
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic

How I did it
Modified pg_profile_lookup.ini to reflect the correct value

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-20 20:25:17 +00:00
Sudharsan Dhamal Gopalarathnam
545b526a49
[202012][mellanox]Fix lpmode set when logical port is larger than 64 (#14137)
This PR is to backport #14138 to 202012.

- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a trace as below

Enabling low-power mode for port Ethernet0... Traceback (most recent call last):
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 167, in
set_lpmode(handle, cmd, sfp_module)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 128, in set_lpmode
SX_MGMT_PHY_MOD_PWR_ATTR_PWR_MODE_E, SX_MGMT_PHY_MOD_PWR_MODE_LOW_E)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 115, in pwr_attr_set
mgmt_phy_mod_pwr_attr_set(handle, module_id, attr_type, power_mode)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 84, in mgmt_phy_mod_pwr_attr_set
assert SX_STATUS_SUCCESS == rc, "sx_mgmt_phy_mod_pwr_attr_set failed"
AssertionError: sx_mgmt_phy_mod_pwr_attr_set failed
Error! Unable to set LPM for 1, rc = 1, err msg: [+] opening sdk
Mar 07 03:25:28 INFO LOG: Initializing SX log with STDOUT as output file.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_get: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_get in its place.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_set: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_set in its place.

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-09 00:04:09 +02:00
Ikki Zhu
be46225033 [Seastone] fix dx010 qsfp eeprom data write issue (#13930)
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.

How I did it
Add i2c access algorithm for CPLD i2c adapters.

How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
2023-03-02 20:06:09 +00:00
Ikki Zhu
f47024cdfd add psu fans status led available config (#13926)
Why I did it
Seastone does not have the psu fans' status led, need to reflect it in platform.json.

How I did it
Set the psu fans status led available to false.

How to verify it
Verify it with platform_tests/api/test_psu_fans.py::TestPsuFans::test_set_fans_led case.
2023-02-28 08:18:28 +00:00
Ikki Zhu
2135c6eb2f [DX010 platform] fix dx010 platform testcase issues (#13595)
Why I did it
1. fix chassis test_set_fans_led case
2. fix chassis get_name case mismatch issue
3. fix fan_drawer test_set_fans_speed
4. fix component test_components test case

How I did it
Add corresponding configuration into chassis json file

How to verify it
Run platform tests cases to verify these failure cases
2023-02-16 17:52:12 +00:00
andywongarista
ff5a703301
Fix content of platform.json for DCS-7050CX3-32S (#13659)
#### Why I did it
Some tests under platform_tests/api were failing on the 7050CX3 due to outdated facts in platform.json

#### How I did it
Updated platform.json facts with appropriate values

#### How to verify it
Run tests under platform_tests/api to verify no failures
2023-02-08 12:01:07 -08:00
Ikki Zhu
85ca3abc2f [Celestica DX010] fix fan drawer and watchdog platform testcase issues (#13426)
Why I did it
fix DX010 fan drawer and watchdog platform test case issues

How I did it
1. Add fan_drawer get_maximum_consumed_power support
2. Adjust maximum watchdog timeout value check

How to verify it
Run test_fan_drawer and test_watchdog test cases.
2023-02-08 04:59:20 +00:00
Ying Xie
027c831be7 [Arista] add support for hardware sku Arista-7260CX3-D92C16 (#13438)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-01-19 19:19:30 +00:00
Ikki Zhu
607cbdefd3 [Celestica Seastone] fix multi sonic platform issues (#13356)
Why I did it
Fix the following issues for Seastone platform:

- system-health issue: show system-health detail will not complete #9530, Celestica Seastone DX010-C32: show system-health detail fails with 'Chassis' object has no attribute 'initizalize_system_led' #11322
- show platform firmware updates issue: Celestica Seastone DX010-C32: show platform firmware updates #11317
- other platform optimization

How I did it
Modify and optimize the platform implememtation.

How to verify it
Manual run the test commands described in these issues.
2023-01-19 19:18:21 +00:00
byu343
e2f9f1e452 [Arista]: Add hwSku Arista-7260CX3-D108C10 (#13242)
* [Arista]: Add hwSku Arista-7260CX3-D108C10

* Add buffer-related config for Arista-7260CX3-D108C10
2023-01-12 23:30:33 +00:00
Ikki Zhu
2438025cf9 Seastone add platform capability enhancement config (#13079) 2023-01-12 23:30:29 +00:00
Neetha John
642c7242f8 Update ECN settings for storage backend (#12855)
Signed-off-by: Neetha John <nejo@microsoft.com>

Why I did it
ECN parameters need to be updated for storage backend

How I did it
Included the check for storage backend devices to update qos configs

How to verify it
Verified that the new ecn settings are applied on storage backend device.
Verified that the old ecn settings are applied for storage frontend, non storage frontend/backend devices
2023-01-10 23:52:39 +00:00
Neetha John
bbcad1362f
[202012] [Profile separation] MMU infrastructure update for TD2 (#12739)
Signed-off-by: Neetha John <nejo@microsoft.com>

This is to backport #12626

Why I did it
There is a need to have separate profiles on compute and storage and this infra update will help achieve that

How I did it
Moved buffer pool/profile and qos definitions on TD2 to a common folder and all TD2 hwsku's will reference that folder
2022-11-29 12:53:58 -08:00
bingwang-ms
47d7e5d0d2
[202012] Apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP on dualtor (#12792)
* Apply separated DSCP_TO_TC_MAP and TC_TO_QUEUE_MAP on dualtor
2022-11-23 21:49:00 +08:00
zzhiyuan
3a68dc0325 [Arista] Increase switch PCIe timeout for 7060-cx32s (#9248)
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up.

No issues would be expected from the setpci command. From the PCIe spec:

"Software is permitted to change the value in this field at any
time. For Requests already pending when the Completion
Timeout Value is changed, hardware is permitted to use either
the new or the old value for the outstanding Requests, and is
permitted to base the start time for each Request either on when
this value was changed or on when each request was issued. "

How I did it
Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder.

Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands.

How to verify it
On bootup of an Arista 7060, can execute:
lspci -vv -s 01:00.0 | grep -i "devctl2"
In order to check that the timeout has changed.
2022-11-23 10:43:54 +00:00
Samuel Angebault
ea3620cde5
[Arista] Remove pcie device monitoring for 7260CX3-64 (#12654) 2022-11-15 15:26:32 -08:00
Ying Xie
eb37bed49c
[201811][DX010] enable LPM (#12641)
Why I did it
Without LPM enabled, the routing table size is very small.

How I did it
Enabling LPM.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-11-09 08:16:03 -08:00
Ying Xie
778df1e178
[202012][RDMA] create split profiles for Arista-7050CX3-32S (#12478)
* [202012][RDMA] create split profiles for Arista-7050CX3-32S

Manually cherry-picking #12228.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-10-31 22:55:06 -07:00
Vivek
458f583e3b
[202012] [Mellanox] [SKU] Mellanox-SN4700-V48C32 SKU added (#12150)
New SKU for MSN-4700 Platform i.e. Mellanox-SN4700-C128

Requirements:
* Breakout: Port 1-32: 4x100G
* Downlinks: 120 (1-30)
* Uplinks: 8 (31-32)
* Shared Headroom: Enabled
* Over Subscribe Ratio: 1:8
* Default Topology: T2
* Default Cable Length for T2: 1500m
* QoS params: The default ones defined in qos_config.j2 will be applied
* Small Packet Percentage: Used 50% for traditional buffer model Note: For dynamic model, the value defined in LOSSLESS_TRAFFIC_PATTERN|AZURE|small_packet_percentage is used

Additional Details:
Switch Type has to be programmed as SpineRouter through config_db.json in DEVICE_METADATA|localhost|type field for the buffer values & cable lengths defined in the buffers_defaults_t2.j2 to apply on the device
Cable Lengths Used for generating buffer_defaults_{t0,t1,t2}.j2 

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2022-10-13 16:52:59 +03:00
Vivek
d14e6f69d5
[202012] [Mellanox] [SKU] Mellanox-SN4700-A96C8V8 SKU added (#12151)
A new SKU for MSN4700 Platform i.e. Mellanox-SN4700-A96C8V8

Requirements:

Breakout:
Port 1-24: 4x25G(4)[10G,1G]
Port 25-28: 2x100G[200G,50G,40G,25G,10G,1G]
Port 29-32: 2x200G[100G,50G,40G,25G,10G,1G]
Downlinks: 96 (1-24) + 4 (25-28)
Uplinks: 4 (29-32)
Shared Headroom: Enabled
Over Subscribe Ratio: 1:4
Default Topology: T0
Default Cable Length for T1: 5m
VxLAN source port range set: No
Static Policy Based Hashing Supported: No

Additional Details:
QoS params: The default ones defined in qos_config.j2 will be applied
Small Packet Percentage: Used 50% for traditional buffer model Note: For dynamic model, the value defined in LOSSLESS_TRAFFIC_PATTERN|AZURE|small_packet_percentage is used
SKU was drafted under the assumption that the downlink ports uses xcvr's that will only support the first 4 lanes of the physical port they are connected to. Hence for the ports 1-24, the last four lanes are not used
Cable Lengths used for generating buffer_defaults_{t0,t1}.j2 D

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2022-10-13 16:51:17 +03:00
bingwang-ms
ee7d9d1c45 Map TC6 to Queue 1 for regular traffic (#11904)
Why I did it
This PR is to update TC_TO_QUEUE_MAP|AZURE for SKU Arista-7050CX3-32S-D48C8 and Arista-7260CX3 T0.

The change is only to align the TC_TO_QUEUE_MAP for regular traffic and bounced traffic. It has no impact on business because we have no traffic being mapped to TC2 or TC6.

How I did it
Update TC_TO_QUEUE_MAP|AZURE , and test cases as well.

How to verify it
Verified by running test case test_j2files.py

/sonic/src/sonic-config-engine$ python3 setup.py test -s tests/test_j2files.py
running test
......
----------------------------------------------------------------------
Ran 29 tests in 25.390s

OK
2022-09-17 00:41:48 +00:00
Vivek
484402ff08
[202012] [Mellanox] [SKU] Mellanox-SN4700-C128 SKU added (11574) (#11878)
- Why I did it
New SKU for MSN-4700 Platform i.e. Mellanox-SN4700-C128

Requirements:
* Breakout: Port 1-32: 4x100G
* Downlinks: 120 (1-30)
* Uplinks: 8 (31-32)
* Shared Headroom: Enabled
* Over Subscribe Ratio: 1:8
* Default Topology: T2
* Default Cable Length for T2: 1500m
* QoS params: The default ones defined in qos_config.j2 will be applied
* Small Packet Percentage: Used 50% for traditional buffer model Note: For dynamic model, the value defined in LOSSLESS_TRAFFIC_PATTERN|AZURE|small_packet_percentage is used

Additional Details:
Switch Type has to be programmed as SpineRouter through config_db.json in DEVICE_METADATA|localhost|type field for the buffer values & cable lengths defined in the buffers_defaults_t2.j2 to apply on the device
Cable Lengths Used for generating buffer_defaults_{t0,t1,t2}.j2 values

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2022-09-04 11:05:22 +03:00
Dev Ojha
8c57f0521f [Arista7050cx3] TD3 SKU changes for pg headroom value after interop testing with cisco 8102 (#11901)
Why I did it
After PFC interop testing between 8102 and 7050cx3, data packet losses were observed on the Rx ports of the 7050cx3 (inflow from 8102) during testing. This was primarily due to the slower response times to react to PFC pause packets for the 8102, when receiving such frames from neighboring devices. To solve for the packet drops, the 7050cx3 pg headroom size has to be increased to 160kB.

How I did it
Modified the xoff threshold value to 160kB in the pg_profile file to allow for the buffer manager to read that value when building the image, and configuring the device

How to verify it
run "mmuconfig -l" once image is built


Signed-off-by: dojha <devojha@microsoft.com>
2022-08-31 11:10:22 -07:00
Arun Saravanan Balachandran
c1712b8c9a
[202012] DellEMC: S6000, S6100, Z9332f - Add capabilities fields in platform.json (#11772) 2022-08-31 09:06:47 -07:00
Ikki Zhu
cf12aa549a [hlx/sfp] fix hlx platform sfp+ tx disable issue (#11532)
Why I did it:
To fix hlx platform sfp+ module tx disable issue

How I did it:
Fix sfp+ tx disable function according SFF-8472 specification

Co-authored-by: Eric Zhu <erzhu@celestica.com>
2022-08-09 21:05:08 +00:00
bingwang-ms
84aca00847
[202012]Support different DSCP_TO_TC_MAP for T1 in dualtor deployment (#11580)
Why I did it
This PR is to backport #11569 into 202012 branch.
This PR is to apply different DSCP_TO_TC_MAP to downlink and uplink ports on T1 in dualtor deployment.
For T1 downlink ports (To T0)
The DSCP_TO_TC_MAP is not changed. DSCP2 and DSCP6 are mapped to TC2 and TC6 respectively.
For T1 uplink ports (To T1)
A new DSCP_TO_TC_MAP|AZURE_UPLINK is defined and applied. DSCP2 and DSCP6 are mapped to TC1 to avoid mixing up lossy and lossless traffic from T2.
The extra lossy PG2 and PG6 added in PR #11157 is reverted as well because no traffic from T2 is mapped to PG2 or PG6 now.

How I did it
Define a new map DSCP_TO_TC_MAP|AZURE_UPLINK for 7260 T1.

How to verify it
Verified by test case in test_j2files.py.
2022-08-01 08:59:45 -07:00
Stephen Sun
44ecff1154
Support queue 7 in dual ToR scenario (#11570)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-08-01 09:27:49 +08:00
Kebo Liu
c40435c94c
[202012] [Mellanox] Add new sensor conf to support SN4410 A1 system (#8379) (#11530)
- Why I did it
New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.

- How I did it
Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.

- How to verify it
Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
2022-07-29 17:41:44 +02:00
Taylor Cai
c4927e0e68 [device/celestica]:Fix failed test case of Seastone snmp (#11430)
* Update psu.py
* Update thermal.py
2022-07-27 23:28:11 +00:00
Neetha John
15cc046eda
[202012] Update MMU and ECN settings for Arista-7260CX3-D96C16 (#11427)
Signed-off-by: Neetha John <nejo@microsoft.com>

Why I did it
Missed this sku in the previous PR #11398

How I did it
Update the dynamic threshold to 0 and ECN settings as 2mb/10mb/5%

How to verify it
Updated unit tests to use the modified values for 7260 ecn settings.
2022-07-15 09:33:39 -07:00
Kebo Liu
aa4379ddbe
[202012] [Mellanox] Add sensor conf file for new version of MSN3700/3700C/4600C platforms (#11358)
- Why I did it
MSN3700/3700C/4600C have been re-spined, the new HW version of platforms has different sensors, so need to apply the correct sensor.conf for them.

- How I did it
Add new sensor.conf files for the new re-spined platforms.
Enhance the logic of "get_sensors_conf_path" for the related platforms in order to load the correct sensor.conf for each version of platforms.

- How to verify it
run sensors test on different versions of platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-07-14 08:59:10 +03:00
Neetha John
4de610af15
[202012] Update 7260 MMU and ECN settings (#11398)
Signed-off-by: Neetha John <nejo@microsoft.com>

Why I did it
Improve throughput and latency for 7260 deployments

How I did it
Update the dynamic threshold to 0 and ECN settings as 2mb/10mb/5%

How to verify it
Updated unit tests to use the modified values for 7260 ecn settings.
2022-07-12 08:46:44 -07:00
Ying Xie
1d55dca6d3 [Buffer] Separate buffer profile for Arista-7060CX-32S-Q24C8
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-07-07 14:09:01 -07:00
Ying Xie
17a9259c55 [7060] fix default port map
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
41518aa825 [Buffer] Separate buffer profile for Arista-7260CX3-Q64
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
f53b2620db [Buffer] Separate buffer profile for Arista-7260CX3-D108C8
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
b625085b46 [Buffer] Separate buffer profile for Arista-7260CX3-C64
50G data is not accurate, needs further update.

Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
5b42ba021b [Buffer] Separate buffer profile for Arista-7060CX-32S-C32
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
83780549c7 [Buffer] Separate buffer profile for Arista-7060CX-32S-D48C8
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
e8f04cd2e6 [Buffer] Separate buffer profile for Arista-7060CX-32S-Q32
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
2c81f02b13 [Buffer] Separate buffer profile for Celestica-DX010-D48C8
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Kevin Wang
4dbdc8e0a0 [Buffer] Separate buffer profile for Force10-S6100
Signed-off-by: Kevin Wang <shengkaiwang@microsoft.com>
2022-07-07 14:09:01 -07:00
Ying Xie
2ed29da38d [buffer] create infrastructure to enable buffer/QoS profiles
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-07-07 14:09:01 -07:00
vmittal-msft
6ada55439d Updated buffer profile settings for TD3 based HWSKUs (Arista-7050CX3-32S-C32, Arista-7050CX3-32S-D48C8) (#11202)
* Updated buffer profile settings for TD3 based HWSKUs (Arista-7050CX3-32S-C32, Arista-7050CX3-32S-D48C8)
2022-07-05 20:57:53 +00:00
Stephen Sun
fe6be5da92
[202012] Configure different map between uplink and downlink on t1 switch in dual ToR scenario (#11299)
- Why I did it
Configure different DSCP_TO_TC_MAP between uplink and downlink on T1 switch in dual ToR scenario
On T1 uplink, both DSCP 2/6 will be mapped to TC 1 for the purpose of avoiding such traffic occupying lossless buffers.
On T1 downlink, they will be mapped to TC 2/6 respectively. (unchanged)

- How I did it
For vendors who want to configure different DSCP_TO_TC_MAP between uplinks and downlinks on T1, they should
Define generate_dscp_to_tc_map macro in SKU's qos.json.j2 file
Define map AZURE for downlink and AZURE_UPLINK for uplink
Define jinja2 variable different_dscp_to_tc_map as True

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-07-03 15:58:06 +03:00
Stephen Sun
307d0e2aca
[Mellanox][202012] Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario (#11032)
Why I did it
Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario

1. Support additional queue and PG in buffer templates, including both traditional and dynamic model
2. Support mapping DSCP 2/6 to lossless traffic in the QoS template.
3. Add macros to generate additional lossless PG in the dynamic model
4. Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2
  - Buffer tables are rendered via using macros.
  - Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones.
5. Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues.

On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as:
40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues
16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues

Signed-off-by: Stephen Sun stephens@nvidia.com

How to verify it
Run regression test.
2022-06-21 10:04:49 -07:00
bingwang-ms
6ddf5cd7dc
[202012] [cherry-pick] Generate switch level dscp_to_tc_map entry from qos_config template (#11132)
* Generate switch level dscp_to_tc_map

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-06-17 20:49:56 +08:00
Jon Goldberg
159613153f [Nokia ixs7215] change var/log size to 4GB (#11122)
This makes use of #11121 to add support for configuration of VAR_LOG_SIZE on Nokia IXS7215
2022-06-14 09:02:14 -07:00
Guohan Lu
b0c48f9b31 [devices]: fix j2 syntax error for the config.bcm in Arista-7260CX3-D108C8
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2022-06-10 11:23:28 -07:00