**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.
**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service
**- How to verify it**
Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs
1. Stop teamd docker alone
> swss, syncd dockers seen going away
> The LAG reference count error messages seen for a while till swss docker stops.
> Dockers back up.
2. Enable WR mode for teamd. Stop teamd docker alone
> swss, syncd dockers not removed.
> The LAG reference count error messages not seen
> Repeated stop teamd docker test - same result, no effect on swss/syncd.
3. Stop swss docker.
> swss, teamd, syncd goes off - dockers comes back correctly, interfaces up
4. Enable WR mode for swss . Stop swss docker
> swss goes off not affecting syncd/teamd dockers.
5. Config reload
> no reference counter error seen, dockers comes back correctly, with interfaces up
6. Warm reboot, observations below
> swss docker goes off first
> teamd + syncd goes off to the end of WR process.
> dockers comes back up fine.
> ping traffic between VM's was NOT HIT
7. Fast reboot, observations below
> teamd goes off first ( **confirmed swss don't exit here** )
> swss goes off next
> syncd goes away at the end of the FR process
> dockers comes back up fine.
> there is a traffic HIT as per fast-reboot
8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
* Multi DB with namespace support, Introducing the database_global.json file
for supporting accessing DB's in other namespaces for service running in
linux host
* Updates based on comments
* Adding the j2 templates for database_config and database_global files.
* Updating to retrieve the redis DIR's to be mounted from database_global.json file.
* Additional check to see if asic.conf file exists before sourcing it.
* Updates based on PR comments discussion.
* Review comments update
* Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access.
* Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier.
* Removing the database_config.json file to avioid confusion in future.
We use the database_config.json.j2 file to generate database_config.json files dynamically.
* Update the comments for sudo usage in docker_image_ctrl.j2
* Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the
PONG response is received when redis server is up.
* Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli
* Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.
* [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector
* update comment for a potential bug
* update comment
* add TODO maker as review reqirement
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service dependent] describe non-warm-reboot dependency outside systemctl
When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [service] teamd service should not require swss service
Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* refactoring code
* rename functions to match other functions in the file
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.
attach() is called by service start method, which is non-blocking.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [services] Ensure swss and syncd services start before dependent services
* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each
* Add 'After=swss.service' to syncd.service
need to flush asic db in swss.sh instead of syncd.sh
orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
Clear WARM_RESTART table could cause component level warm restart to
fail due to missing WARM_RESTART state.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
This script shall not flush all the entries in the state database
when it starts up, since there are entries maintained and written
by other processes outside this docker.
The issue we noticed was that the portchannel states are cleaned
up after teamsyncd writes the entries into the database, which
causes the IPs failed to be configured because intfmgrd considers
the portchannels are not ready yet.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* Adapt to the new WARM_RESTART_TABLE table schema: change from restart_count to restore_count
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update variable and function name to match restore_count name change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update swss submodule for warm restart schema change
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* [swss.sh] refactor ssh service script code
- Move checks and waits to helper functions.
- Remove early returns from code stream
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [swss.sh] Add debug log for service state changes
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [syncd] Separate out syncd service from swss service
Still make them start/stop/restart synchronously so existing scripts
continue working.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Remove extra 'After' in swss service and remove syncd docker warm boot code
Syncd warm boot needs more thinking, we can put it back once the work
flow has been defined and ready for coding/testing.
* [syncd] syncd start/stop/restart shouldn't affect swss state
Semi-detach syncd service state change from swss:
- swss state change still chase syncd service to follow except warm boot
- syncd state change will only affect itself.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* add missing '{'