[201811] Check platform reboot cause to see if any reset happened during fast/warm-reboot
Why I did it
To recover syncd and swss from any cold reset during fast/warm-reboot
How I did it
Check platform reboot-cause to see if any cold reset happens for fast-reboot power up
How to verify it
Manual test
Why I did it
In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials.
Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present.
How I did it
By adding a service, that would fire 5 mins after boot.
This service apply tacacs if available.
How to verify it
Upgrade and watch status of tacacs.timer & tacacs.service
You may create /etc/sonic/old_config/tacacs.json, with updated credentials
(before 5mins after boot) and see that appears in config & persisted too.
Why I did it
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.
How I did it
Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100.
Fixed the "/host unmount failed" issue as well in 201811.
How to verify it
Issue "reboot" command to verify if the reboot is happening gracefully.
Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices
Changed the dynamic threshold settings in pg_profile_lookup.ini
Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices
Necessary changes made in qos.config.j2 to use the macro if present
Signed-off-by: Neetha John <nejo@microsoft.com>
admin@sonic:~$ sudo hw-management-wd.sh
Usage: hw-management-wd.sh start [timeout] | stop | tleft | check_reset | help
start - start watchdog
timeout is optional. Default value will be used in case if it's omitted
timeout provided in seconds
stop - stop watchdog
tleft - check watchdog timeout left
check_reset - check if previous reset was caused by watchdog
Prints only in case of watchdog reset
help -this help
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* [201811][monit] address build issue: hard code ARCH to amd64
- also hard code the debian package path as in 201811 branch.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.
* Avoid reloading already uploaded file, by marking the names with a prefix.
* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.
* Fix few bugs
* The binary update_json.py will come from sonic-utilities.
* Corefile uploader service
1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.
* Remove rw permission for .rc file for group & others.
* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.
* Azure storage upload requires python module futures, hence added it to install list.
* Removed trailing spaces.
* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
* Add watchdog-control service to disable watchdog during bootup
Disable only if it's applicable and the watchdog is enabled.
* Address the review comment
* Correct the watchdog start script name
* Change to call common watchdog api instead of platform specific
* Start watchdog control service after swss starts
* advance sonic-utility submodule
Revert "Configure buffer profile to all ports (#3561)" (#3628)
Configure buffer profile to all ports (#3561)
This reverts commit 8861cbe98e.
Signed-off-by: Wenda Ni <wenni@microsoft.com>
interfaces-config service configures lo address. If bgp service
starts before lo address is configured, then following config
in zebra will not be applied.
route-map RM_SET_SRC permit 10
set src 10.1.0.32
The adds a few seconds delay in bgp service start
* Add debug docker for SNMP.
* Removed a redundant install of debug packages.
Propagate the debug flag to template file to mount /dbg & /src to debug containers.
* Revert the last change to retain the original
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
* backport new platform api to 201811, reboot cause part
* install new platform api on host
* 1. remove chassis's dependency on sonic_platform_daemon.
2. add some mellanox-specific hardware reboot causes.
3. fix typo in files/image_config/process-reboot-cause/process-reboot-cause.
* 1. add dependency of sonic_platform for base image
2. handle the case of reboot cause file not found
* adjust log message.
- Make sure that migrated DB contents persisted for next boot
- Make sure that db saved after warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
* [submodule] update sonic-linux-kernel (#2985)
* Fix many version strings
* Update minor version
* Update arista-drivers submodule (#9)
* Rebuild SDK on new kernel (#10)
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.
Signed-off-by: Wenda Ni <wenni@microsoft.com>