Commit Graph

27 Commits

Author SHA1 Message Date
Vadym Hlushko
1d57472eb0
[graceful reboot] Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle (#18324)
**DEPENDS ON: [[graceful reboot] Add the pre_reboot_hook script execution, add the watchdog arm before the reboot](https://github.com/sonic-net/sonic-utilities/pull/3203)**

#### Why I did it
Add support for the `graceful reboot` instead of  the `sysfs power cycle` to avoid filesystem corruption 

### How I did it
Rename the `platform_reboot` script to the `pre_reboot_hook`.
Remove the sysfs power cycle function, from now on the Debian reboot (`/sbin/reboot`) will be executed instead of the sysfs power cycle.

#### How to verify it
1. Start watching logs by using `show log -f` and `journalctl -p debug -f`
2. Execute the `reboot` command from the switch CLI
3. Check in logs that all systemd services terminated
2024-03-23 16:45:36 -07:00
noaOrMlnx
b25dfa91c1
[Mellanox] Update Nvidia sai.profile SKU files to have common file (#18074)
* Update Nvidia sai.profile SKU files to have common file

* Remove SAI_DUMP_MFT_CFG_PATH from sai-common.profile as it is not in use
2024-02-28 11:05:20 -08:00
Volodymyr Samotiy
f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00
Nazarii Hnydyn
845bb80a3c
[ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945)
HLD: sonic-net/SONiC#1084

To improve FAST reboot dataplane downtime

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-01 21:50:14 -07:00
Vadym Hlushko
3bd396043e
[buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-03 08:35:57 -07:00
Vivek
d4923615d6
[Mellanox] [SN4410] Support new breakout modes for PAM4 (#15668)
- Why I did it
Add new breakout modes to be used in PAM4 supported cables

- How I did it

- How to verify it
Verified the 50G per lane breakout modes are applied properly on the switch

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-08-16 08:30:33 +03:00
Vadym Hlushko
521a86b2de
[Mellanox] Add mlxtrace to techsupport (#15961)
- Why I did it
Added the fwtrace config files in order to be able to call the mlxstrace utility during the show techsupport dump.

Work item tracking
Microsoft ADO (number only):

- How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.

- How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-08-03 11:36:58 +03:00
Kebo Liu
129e9d1b25
fix MSN4410 chassis name in platform_components.json (#9939)
- Why I did it
The chassis name in MSN4410 platform_components.json is not correct

- How I did it
Fix the chassis name

- How to verify it
Run relevant platform API test

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-02-13 15:01:09 +02:00
Kebo Liu
2cc973b582
[Mellanox] update system_health_monitoring_config for MSN4410/MSN4600/MSN4700 (#9728)
- Why I did it
For MSN4410/MSN4600/MSN4700 now they can support fetching PSU voltage threshold, no need to skip the psu voltage check in system health monitoring, so update the system health monitoring configuration file for these platforms.

- How I did it
remove skip PSU change config from the system_health_monitoring_config.json file

- How to verify it
Build image run on these platforms, system health monitoring will not report error against PSU voltage

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-01-19 10:29:26 +02:00
Vadym Hlushko
727a94634c
[Mellanox] [SN4410] Fixed capability files - port_config.ini, hwsku.json, platform.json (#9538)
- Why I did it
The capability files were incorrect in comparison to the marketing spec of the SN4410 platform.

- How I did it
Aligned the capability files according to the marketing spec.

- How to verify it
Basic manual sanity checks:
1. Check if critical docker containers were UP
2. Check if interfaces were created and were UP
3. Check if interfaces created in the syncd docker container by executing – sx_api_ports_dump.py script
4. Check the logs from the start of the switch – everything was OK
5. Verified the port breakout

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2021-12-15 19:25:20 +02:00
Stephen Sun
ba853348d5
[Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles (#8768)
Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Support zero buffer profiles

Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file
Dependency: It depends on Azure/sonic-swss#1910 and submodule advancing PR once the former merged.

How I did it
Add buffer profiles and pool definition for zero buffer profiles

If the buffer model is static:
Apply normal buffer profiles to admin-up ports
Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
Apply normal buffer profiles to all ports
buffer manager will take care when a port is shut down
Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists

Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:

The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:

One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files

Update sonic-cfggen test accordingly:

Adjust example output file of JSON template for unit test
Add unit test in for Mellanox's new buffer templates.

How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
2021-11-29 08:04:01 -08:00
Dror Prital
5356244e53
[Mellanox] Add NVIDIA Copyright header to "mellanox" files (#8799)
- Why I did it
Add NVIDIA Copyright header to "mellanox" files

- How I did it
Add NVIDIA Copyright header as a comment for Mellanox files

- How to verify it
Sanity tests and PR checkers.
2021-10-17 19:03:02 +03:00
Qi Luo
add9b651b6
Add platform_asic file to each platform folder in sonic-device-data based package (#8542)
#### Why I did it
Add platform_asic file to each platform folder in sonic-device-data package. The file content will be used as the ground truth of mapping from PLATFORM_STRING to switch ASIC family.

One use case of the mapping is to prevent installing a wrong image, which targets for other ASIC platforms. For example, currently we have several ONIE images naming as sonic-*.bin, it's easy to mistakenly install the wrong image. With this mapping built into image, we could fetch the ONIE platform string, and figure out which ASIC it is using, and check we are installing the correct image.

After this PR merged, each platform vendor has to add one mandatory text file  `device/PLATFORM_VENDOR/PLATFORM_STRING/platform_asic`, with the content of the platform's switch ASIC family.

I will update https://github.com/Azure/SONiC/wiki/Porting-Guide after this PR is merged.

You can get a list of the ASIC platforms by `ls -b platform | cat`. Currently the options are
```
barefoot
broadcom
cavium
centec
centec-arm64
generic
innovium
marvell
marvell-arm64
marvell-armhf
mellanox
nephos
p4
vs
```

Also support
```
broadcom-dnx
```

#### How I did it

#### How to verify it
Test one image on DUT. And check the folders under `/usr/share/sonic/device`
2021-10-08 19:27:48 -07:00
Alexander Allen
7e4c4e2b95
[Mellanox] Add 2x100G and 4x50G breakout modes to MSN4410 platform (#8419)
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json

- How I did it
Added the aforementioned breakout modes to platform.json

- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
2021-08-22 15:53:03 +03:00
Kebo Liu
9d2ccda1ec
[Mellanox] Add new sensor conf to support SN4410 A1 system (#8379)
#### Why I did it

New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.

#### How I did it

Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.

#### How to verify it

Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
2021-08-11 17:31:58 -07:00
Alexander Allen
5259af2e2b
[mellanox] remove 2x40G and 4x40G breakout modes due to no hardware support (#8280)
Mellanox platforms do not support 2x40G or 4x40G breakout modes.

Signed-off-by: Alexander Allen <arallen@nvidia.com>
2021-08-01 13:24:26 -07:00
DavidZagury
9d1c1659bd
[Mellanox] Update SKUs to enable SDK dumps (#7708)
- Why I did it
To create SDK dump on Mellanox devices when SDK event has occurred.

- How I did it
Set the SKUs keys needed to initialize the feature in SAI.

- How to verify it
Simulate SDK event and check that dump is created in the expected path.
2021-06-21 16:41:18 +03:00
Alexander Allen
d2e62af85d
[device][mellanox] Fix supported breakout modes on MSN4410 (#7853)
- Why I did it
The default breakout mode according to hwsku.json for the MSN4410 is 1x400G and this is not a supported breakout mode according to its platform.json

This causes a conflict on boot of this platform and no containers on the switch will init successfully.

- How I did it
Referenced the platform specification files and updated platform.json

- How to verify it
Install master version of SONiC on MSN4410
Boot switch and verify swss is successfully running using docker ps
2021-06-20 19:35:44 +03:00
Kebo Liu
629f4459d7
[Mellanox] Align PSU fan name in platform.json with latest change in PR #7490 (#7557)
The PSU fan name convention was changed from "psu_{}_fan_{}" to "psu{}_fan{}" in PR #7490, platform.json need to be changed and aligned.
2021-05-07 09:42:40 -07:00
Kebo Liu
5ac048f7e7
[Mellanox] Enhance the platform.json with adding more platform device facts. (#7495)
#### Why I did it

Current platform.json lacks some peripheral device related facts, like chassis/fan/pasu/drawer/thermal/components names, numbers, etc.

#### How I did it

Add platform device facts to the platform.json file

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-05-03 12:22:13 -07:00
Stephen Sun
46a7fac1aa
Bug fix: Support dynamic buffer calculation on ACS-MSN3420 and ACS-MSN4410 (#7113)
- Why I did it
Add missed files for dynamic buffer calculation for ACS-MSN3420 and ACS-MSN4410

- How I did it
asic_table.j2: Add mapping from platform to ASIC
Add buffer_dynamic.json.j2 for ACS-MSN4410.

- How to verify it
Check whether the dynamic buffer calculation daemon starts successfully.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-04-07 20:33:15 +03:00
Junchao-Mellanox
48042b7256
[Mellanox] Use softlink for sfputils on MSN4410 platform (#7092)
The file device/mellanox/x86_64-mlnx_msn4410-r0/plugins/sfputil.py is not a software link for device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py. And it is still using python2 syntex which causes some SFP CLI error. The PR is to change it to a softlink and add 4410 support in device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py.
2021-03-27 11:56:48 -07:00
Sangita Maity
18263c99dd
[DPB|master] Update Dynamic Port Breakout Logic for flexible alias support a… (#6831)
To fix [DPB| wrong aliases for interfaces](https://github.com/Azure/sonic-buildimage/issues/6024) issue, implimented flexible alias support [design doc](https://github.com/Azure/SONiC/pull/749)

> [[dpb|config] Fix the validation logic of breakout mode](https://github.com/Azure/sonic-utilities/pull/1440) depends on this

#### How I did it

1. Removed `"alias_at_lanes"` from port-configuration file(i.e. platfrom.json) 
2. Added dictionary to "breakout_modes" values. This defines the breakout modes available on the platform for this parent port, and it maps to the alias list. The alias list presents the alias names for individual ports in order under this breakout mode.
```
{
    "interfaces": {
        "Ethernet0": {
            "index": "1,1,1,1",
            "lanes": "0,1,2,3",
            "breakout_modes": {
                "1x100G[40G]": ["Eth1"],
                "2x50G": ["Eth1/1", "Eth1/2"],
                "4x25G[10G]": ["Eth1/1", "Eth1/2", "Eth1/3", "Eth1/4"],
                "2x25G(2)+1x50G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"],
                "1x50G(2)+2x25G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"]
            }
        }
}
```
#### How to verify it
`config interface breakout`

Signed-off-by: Sangita Maity <samaity@linkedin.com>
2021-02-26 00:13:33 -08:00
Joe LeVeque
18f2c5cfdd
[platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 14:41:05 -08:00
Vadym Hlushko
db5a88eef7
[DPB] [Mellanox] added capability files for SN4410 platform (#6059)
- Why I did it
platform.json and hwsku.json files are required for a feature called Dynamic Port Breakout

- How I did it
Created capability files according to platform specification SN4410

- How to verify it
Full qualification requires bugs fixes reported under sonic-buildimage

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2021-01-17 11:08:42 +02:00
Vadym Hlushko
b56320ce56
[SN4410] fixed 'port_config.ini' (#6316)
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2021-01-13 15:45:05 +02:00
Vadym Hlushko
503873056e
[Mellanox] SN4410 support (#5778)
Add support for Mellanox Spectrum-3 based 100GbE/400GbE 1U. 24 QSFP-DD28 and 8 QSFP-DD ports

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2020-11-24 10:43:48 -08:00