Why I did it
Orchagent sometimes take additional time to execute Tunnel tasks. This cause write_standby script to error out and mux state machines are not initialized. It results in show mux status missing some ports in output.
Mar 13 20:36:52.337051 m64-tor-0-yy41 INFO systemd[1]: Starting MUX Cable Container...
Mar 13 20:37:52.480322 m64-tor-0-yy41 ERR write_standby: Timed out waiting for tunnel MuxTunnel0, mux state will not be written
Mar 13 20:37:58.983412 m64-tor-0-yy41 NOTICE swss#orchagent: :- doTask: Tunnel(s) added to ASIC_DB.
How I did it
Increase timeout from 60s to 90s
How to verify it
Verified that mux state machine is initialized and show mux status has all needed ports in it.
Why I did it
At service start up time, there are chances that the networking service is being restarted by interface-config service. When that happens, write_standby could fail to make DB connections due to loopback interface is being reconfigured.
How I did it
Force the db connector to use unix socket to avoid loopback reconfig timing window.
How to verify it
Run config reload test 20+ times and no issue encountered.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* use unix socket instead
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [mux] skip mux operations during warm shutdown
- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay.
However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started.
So this script will have to write something at start up time.
For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if
link prober found link not in good state.
How I did it
Update the script to always provide initial values.
How to verify it
Tested on active-active dual-tor testbed.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Avoid write_standby in warm restart context.
sign-off: Jing Zhang zhangjing@microsoft.com
Why I did it
In warm restart context, we should avoid mux state change.
How I did it
Check warm restart flag before applying changes to app db.
How to verify it
Ran write_standby in table missing, key missing, field missing scenarios.
Did a warm restart, app db changes were skipped. Saw this in syslog:
WARNING write_standby: Taking no action due to ongoing warmrestart.
[write_standby]: Ignore non-auto interfaces
* In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>