sonic-buildimage/device/mellanox/x86_64-mlnx_msn4700-r0
Volodymyr Samotiy f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00
..
ACS-MSN4700 [ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945) 2023-11-01 21:50:14 -07:00
Mellanox-SN4700-A96C8V8 [ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945) 2023-11-01 21:50:14 -07:00
Mellanox-SN4700-C128 [ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945) 2023-11-01 21:50:14 -07:00
Mellanox-SN4700-O8C48 [ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945) 2023-11-01 21:50:14 -07:00
Mellanox-SN4700-O8V48 [Mellanox][SKU] Adding Mellanox-SN4700-O8V48 SKU (#17425) 2023-12-10 16:18:11 +02:00
Mellanox-SN4700-O28 [Mellanox] Change the default breakout mode for internal ports of the Mellanox-SN4700-O28 SKU. (#17192) 2023-11-21 09:51:36 +02:00
Mellanox-SN4700-V48C32 [ppi]: Enable global port late create for all Mellanox HWSKUs. (#16945) 2023-11-01 21:50:14 -07:00
plugins [Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901) 2020-03-24 14:32:52 +02:00
default_sku [Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901) 2020-03-24 14:32:52 +02:00
get_sensors_conf_path [Mellanox] Support new sensor conf file for MSN4700 A1/A0 (#7535) 2021-05-06 10:13:26 -07:00
installer.conf [Mellanox] Disable SSD NCQ on Mellanox platforms (#17567) 2024-01-28 16:26:07 +02:00
pcie.yaml add pcied config files for mellanox platform (#5669) 2020-11-02 19:45:36 -08:00
platform_asic Add platform_asic file to each platform folder in sonic-device-data based package (#8542) 2021-10-08 19:27:48 -07:00
platform_components.json [Mellanox] Update platform components config files. (#5685) 2020-10-25 19:44:37 +02:00
platform_reboot [Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901) 2020-03-24 14:32:52 +02:00
platform_wait [Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901) 2020-03-24 14:32:52 +02:00
platform.json [Mellanox] [4700] Update platform capability file to support new breakout mode (#11614) 2022-08-24 11:55:33 +03:00
pmon_daemon_control.json [Mellanox] Add a new Mellanox platform x86_64-mlnx_msn4700 and new SKU ACS-MSN4700 (#3901) 2020-03-24 14:32:52 +02:00
sensors.conf Fix MSN4700 sensors labels (#5861) 2020-11-10 18:33:24 +02:00
sensors.conf.a1 [Mellanox] Add NVIDIA Copyright header to "mellanox" files (#8799) 2021-10-17 19:03:02 +03:00
system_health_monitoring_config.json [Mellanox] update system_health_monitoring_config for MSN4410/MSN4600/MSN4700 (#9728) 2022-01-19 10:29:26 +02:00
thermal_policy.json [Mellanox] Enhancement for fan led management (#4437) 2020-05-13 10:01:32 -07:00