- Why I did it Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ. There seems to be a problem between some kernel versions and some SATA controllers. Syslog error message examples: Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED". Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED". Some vendors already disabled NCQ on their platforms in SONiC due to similar issue: [Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products [Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ: https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous - How I did it Add a kernel parameter to tell libata to disable NCQ - How to verify it Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 |
||
---|---|---|
.. | ||
accton | ||
alphanetworks | ||
arista | ||
barefoot | ||
broadcom | ||
celestica | ||
centec | ||
cig | ||
common | ||
dell | ||
delta | ||
facebook/x86_64-facebook_wedge100-r0 | ||
fs/arm64-fs_s5800_48t4s-r0 | ||
ingrasys | ||
inventec | ||
juniper | ||
marvell | ||
mellanox | ||
mitac/x86_64-mitac_ly1200_b32h0_c3-r0 | ||
netberg | ||
nokia | ||
pegatron/x86_64-pegatron_porsche-r0 | ||
pensando/arm64-elba-asic-r0 | ||
quanta | ||
ragile | ||
ruijie/x86_64-ruijie_b6510-48vs8cq-r0 | ||
supermicro/x86_64-supermicro_sse_t7132s-r0 | ||
tencent | ||
ufispace | ||
virtual | ||
wistron | ||
wnc/x86_64-wnc_osw1800-r0 |