sonic-buildimage/device/mellanox/x86_64-mlnx_msn3700c-r0
Volodymyr Samotiy f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00
..
ACS-MSN3700C [buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233) 2023-10-03 08:35:57 -07:00
plugins [Mellanox]Implement plugins for PSU, fan and thermal (#4041) 2020-01-24 11:27:32 -08:00
default_sku [devices]: Added new SN3700/SN3700C Mellanox platforms (#2548) 2019-02-13 23:08:04 -08:00
get_sensors_conf_path [Mellanox] Add Sensor conf to support respined platforms(SN3700/SN3700C/SN4600C) (#11553) 2022-08-10 18:09:10 +03:00
installer.conf [Mellanox] Disable SSD NCQ on Mellanox platforms (#17567) 2024-01-28 16:26:07 +02:00
pcie.yaml [Mellanox] Add NVIDIA Copyright header to "mellanox" files (#8799) 2021-10-17 19:03:02 +03:00
platform_asic Add platform_asic file to each platform folder in sonic-device-data based package (#8542) 2021-10-08 19:27:48 -07:00
platform_components.json [Mellanox] Update platform components config files. (#5685) 2020-10-25 19:44:37 +02:00
platform_reboot [devices]: Added new SN3700/SN3700C Mellanox platforms (#2548) 2019-02-13 23:08:04 -08:00
platform_wait [mellanox]: Upgraded hw-management V.2.0.0160. (#2643) 2019-03-06 18:51:46 -08:00
platform.json [mellanox] remove 2x40G and 4x40G breakout modes due to no hardware support (#8280) 2021-08-01 13:24:26 -07:00
pmon_daemon_control.json [Pmon] dynamically load pmon daemons (#2654) 2019-03-22 02:49:35 -07:00
sensors_respin.conf [Mellanox] Add Sensor conf to support respined platforms(SN3700/SN3700C/SN4600C) (#11553) 2022-08-10 18:09:10 +03:00
sensors_swb_respin.conf [Mellanox] Add Sensor conf to support respined platforms(SN3700/SN3700C/SN4600C) (#11553) 2022-08-10 18:09:10 +03:00
sensors.conf [Mellanox] Auto correct PSU voltage threshold (WA) (#10394) 2022-04-14 08:14:40 +03:00
system_health_monitoring_config.json [Mellanox] Add system health configuration file for Mellanox platforms (#4834) 2020-07-13 10:20:22 -07:00
thermal_policy.json Add thermal control support for SONiC (#3949) 2020-03-09 10:41:10 -07:00