44427a2f6b
This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first. **What I did** Add orchagent watchdog to monitor and alert orchagent stuck issue. **Why I did it** Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. **How I verified it** Pass all UT. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). **Details if related** Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306 |
||
---|---|---|
.. | ||
Aboot | ||
apt | ||
build/versions | ||
build_scripts | ||
build_templates | ||
dhcp | ||
docker | ||
image_config | ||
initramfs-tools | ||
scripts | ||
sshd |