[Arista] Disable ATA NCQ for a few products (#13739)

Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
This commit is contained in:
Samuel Angebault 2023-02-15 10:31:59 -08:00 committed by GitHub
parent b0e0b23d2a
commit 5ce1b8e4b7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -621,6 +621,10 @@ write_platform_specific_cmdline() {
cmdline_add intel_idle.max_cstate=0
read_system_eeprom
fi
if in_array "$platform" "rook"; then
# Currently applies to Alhambra, Blackhawk, Gardena and Mineral
cmdline_add libata.force=1.00:noncq
fi
if in_array "$platform" "crow" "magpie"; then
cmdline_add amd_iommu=off
cmdline_add modprobe.blacklist=snd_hda_intel,hdaudio