sonic-buildimage/device/mellanox/x86_64-mlnx_msn2700a1-r0
Volodymyr Samotiy 15a0f912bd [Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-02-01 04:32:27 +08:00
..
ACS-MSN2700-A1 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
default_sku [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
installer.conf [Mellanox] Disable SSD NCQ on Mellanox platforms (#17567) 2024-02-01 04:32:27 +08:00
Mellanox-SN2700-A1 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
Mellanox-SN2700-A1-C28D8 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
Mellanox-SN2700-A1-D40C8S8 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
Mellanox-SN2700-A1-D44C10 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
Mellanox-SN2700-A1-D48C8 [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
pcie.yaml [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
platform_asic [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
platform_components.json [Mellanox] fix new MSN2700-A1 platform name (#17151) (#17198) 2023-11-16 21:40:47 +08:00
platform_reboot [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
platform_wait [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
platform.json [Mellanox] fix new MSN2700-A1 platform name (#17151) (#17198) 2023-11-16 21:40:47 +08:00
plugins [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
pmon_daemon_control.json [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
sensors.conf [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
system_health_monitoring_config.json [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00
thermal_policy.json [Mellanox] add new platform 2700 a1 (#16515) (#16795) 2023-10-08 03:06:03 +08:00