Commit Graph

6 Commits

Author SHA1 Message Date
Vadym Hlushko
1d57472eb0
[graceful reboot] Rename the platform_reboot to the pre_reboot_hook, remove the sysfs power cycle (#18324)
**DEPENDS ON: [[graceful reboot] Add the pre_reboot_hook script execution, add the watchdog arm before the reboot](https://github.com/sonic-net/sonic-utilities/pull/3203)**

#### Why I did it
Add support for the `graceful reboot` instead of  the `sysfs power cycle` to avoid filesystem corruption 

### How I did it
Rename the `platform_reboot` script to the `pre_reboot_hook`.
Remove the sysfs power cycle function, from now on the Debian reboot (`/sbin/reboot`) will be executed instead of the sysfs power cycle.

#### How to verify it
1. Start watching logs by using `show log -f` and `journalctl -p debug -f`
2. Execute the `reboot` command from the switch CLI
3. Check in logs that all systemd services terminated
2024-03-23 16:45:36 -07:00
Kebo Liu
1b5f72127a
[Mellanox] Remove SFP sensors from sensors.conf (#17631)
- Why I did it
The cable thermal sensors will be deprecated from the kernel driver. When cable host management is enabled, NOS will fetch the cable temperature from cable EEPROM, kernel driver will not provide the sysfs anymore.

- How I did it
Remove the relevant sensor form the conf files

- How to verify it
Run sonic mgmt sensor test

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2024-02-12 16:12:57 +02:00
Volodymyr Samotiy
f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00
Junchao-Mellanox
c02c8f0cc3
[Mellanox] remove log in RAM kernel option for 2700 A1 platform (#17254)
- Why I did it
Remove logs_inram kernel option

- How I did it
Remove logs_inram kernel option

- How to verify it
SONiC mgmt regression test of 202305
2023-12-05 17:52:38 +02:00
Kebo Liu
8b62e7a5b2
[Mellanox] fix new MSN2700-A1 platform name (#17151)
- Why I did it
New introduced MSN2700 platform has a different platform name compared to the old one, it should be "MSN2700-A1".

- How I did it
Update the name to the new one in platform.json and platform_components.json.

- How to verify it
run platform-related sonic-mgmt test cases on the new platform.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-11-15 08:29:11 +02:00
Junchao-Mellanox
5138afe4e7
[Mellanox] add new platform 2700 a1 (#16515)
- new pcie.yaml
- new sensors.conf
- new thermal support
- new platform.json file
- adjust test code
2023-09-23 00:15:17 -07:00