Why I did it
To monitor the SSD health condition in DellEMC S6100 platform post upgrade.
A daemon is introduced to monitor the SSD every one hour.
To check for SSD status at boot time and at the time of cold-reboot.
All these changes are supported only for newer SSD firmware.
Porting changes from 201911 branch
Added a platform_reboot_pre_check script to prevent cold-reboot based on SSD status.
Depends on Azure/sonic-utilities#1788
DO NOT MERGE UNTIL ABOVE PR IS MERGED
How I did it
On branch s6100_ssd_202012
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/platform_reboot_pre_check
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_platform.sh
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_ssd_mon.sh
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_ssd_upgrade_status.sh
new file: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/soft-reboot_plugin
new file: platform/broadcom/sonic-platform-modules-dell/s6100/systemd/s6100-ssd-monitor.service
new file: platform/broadcom/sonic-platform-modules-dell/s6100/systemd/s6100-ssd-monitor.timer
new file: platform/broadcom/sonic-platform-modules-dell/s6100/systemd/s6100-ssd-upgrade-status.service
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
#### Why I did it
- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.
#### How I did it
- Modified get_transceiver_change_event in 1.0 to poll mode.
- Why I did it
For determining reboot-cause while running newer BIOS, SMF firmware.
- How I did it
Made changes in reboot-cause determination script to add support for behavior of newer firmware.
- How to verify it
Performed different type of resets and verified "show reboot-cause" provides the correct reason.
Logs: UT_logs.txt
- Description for the changelog
DellEMC S6100: Update reboot-cause determination to support new firmware
- Make DellEMC platform modules Python3 compliant.
- Change return type of PSU Platform APIs in DellEMC Z9264, S5232 and Thermal Platform APIs in S5232 to 'float'.
- Remove multiple copies of pcisysfs.py.
- PEP8 style changes for utility scripts.
- Build and install Python3 version of sonic_platform package.
- Fix minor Platform API issues.
- Why I did it
For fixing PCA MUX attachment issue in Dell S6100 platform.
- How I did it
Wait till IOM MUX powered up properly and start I2C enumeration.
- Xilinx/pericom peripherals are not actively used in DellEMC S6100 switch.
- These peripherals are throwing PCIE corrected messages in some of the units and filling syslog.
- Since it is not usable disabling it at startup.
- optoe driver truncates invalid pages(ff) but sff driver doesn't truncate.so,the DOM related calculation made by sff8436 driver will show incorrect data.
- Few optics doesn't support DOM.
- SFP plugins currently returns None for unreadable pages and this'd throw the below mentioned error in sfpshow eeprom --dom.
Added Reboot Reason for S6000 in platform 2.0
Fixed issue in process-reboot-cause
Added package uninstall code in platform de-init code for z9100, s6100
- How I did it
-> Added support for S6000 Reboot Reason
-> Added platform.py for all platforms
-> Verified show reboot-cause command with the code changes. Added UT logs with show reboot-cause
-> Modified process-reboot-cause service to start after pmon.service. In S6000, we have to wait for nvram to be loaded.
-> If reboot-cause service starts before pmon.service, show reboot-cause is showing incorrect reason.
-> Bug fix in process-reboot-cause file
- import sonic_platform
+ import sonic_platform.platform
The following commit addresses the graceful unmounting of file
system and graceful shutdown of dockers before calling a
cold reboot which will cause a power cycle of SSD. This ensures
orderly shutdown and no corruption of files systems because
of the power cycle to SSD.
This commit will use the existing systemd-reboot service scripts
and override the configuration to do cold reboot for S6100 and
Z9100.
Unit tested the fix and graceful shutdown of file system and
dockers are done with cold reboot.
Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>