Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.
Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
Issue Overview
shutdown flow
For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following:
INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use.
INFO syncd.sh[23597]: Unloading sx_core[FAILED]
INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use
config reload & service swss.restart
In the flows like "config reload" and "service swss restart", the failure cause further consequences:
sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16"
syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE"
swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases.
reboot, warm-reboot & fast-reboot
In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched.
summary
To summarize, we have to come up with a way to ensure:
shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow;
don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time.
for "reboot" flow, either order is acceptable.
Solution
To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot.
- How I did it
To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence.
Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline.
This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait()
To start pmon automatically after syncd started.
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script
Signed-off-by: Danny Allen <daall@microsoft.com>
* Update interval for running cron job
* Respond to feedback
* Change syslog id
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
* fix fast reboot compatibility
We should handle both cases for backward-compatible with 201803:
- fast-reboot
- SONIC_BOOT_TYPE=fast-reboot
* handle review comments
* add a comment that getBootType code snippet is shared between two files
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* Perform stop/start of Mellanox driver tools for all types of reboot
* Don't set Mellanox FAST_BOOT option for "cold" reboot
* Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
Clear WARM_RESTART table could cause component level warm restart to
fail due to missing WARM_RESTART state.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- cold shutdown is used by regular service stop and/or fast reboot
- warm shutdown is used by warm restart and/or warm reboot
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This script shall not flush all the entries in the state database
when it starts up, since there are entries maintained and written
by other processes outside this docker.
The issue we noticed was that the portchannel states are cleaned
up after teamsyncd writes the entries into the database, which
causes the IPs failed to be configured because intfmgrd considers
the portchannels are not ready yet.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Update the hw-mgmt to latest release V.2.0.0060.
Update the related files according to the latest hw-mgmt.
Signed-off-by: Kevin Wang <kevinw@mellanox.com>
* Adapt to the new WARM_RESTART_TABLE table schema: change from restart_count to restore_count
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update variable and function name to match restore_count name change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update swss submodule for warm restart schema change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* [syncd] warn shutdown syncd process when warm boot is enabled
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [warmboot] mount folder to hold warmboot temporary files
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix a typo
* [swss.sh] refactor ssh service script code
- Move checks and waits to helper functions.
- Remove early returns from code stream
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [swss.sh] Add debug log for service state changes
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [syncd] Separate out syncd service from swss service
Still make them start/stop/restart synchronously so existing scripts
continue working.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Remove extra 'After' in swss service and remove syncd docker warm boot code
Syncd warm boot needs more thinking, we can put it back once the work
flow has been defined and ready for coding/testing.
* [syncd] syncd start/stop/restart shouldn't affect swss state
Semi-detach syncd service state change from swss:
- swss state change still chase syncd service to follow except warm boot
- syncd state change will only affect itself.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* add missing '{'
* Fix potential blackholing/looping traffic and refresh ipv6 neighbor to avoid CPU hit
In case ipv6 global addresses were configured on L3 interfaces and used for peering,
and routing protocol was using link-local addresses on the same interfaces as prefered nexthops,
the link-local addresses could be aged out after a while due to no activities towards the link-local
addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert
them to the HW, and thus cause looping or blackholing traffic.
Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages.
On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly.
This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed,
and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load.
Also, if link-local addresses were not learned, we may not get them at all later.
It is intended to fix all above issues.
Changes:
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.
* Fix review comments:
Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping
format issue
* Combine regular L3 interface and portchannel interface for looping
* Add ndisc6 package to vs docker