This repository has been archived on 2025-03-20. You can view files and clone it, but cannot push or open issues or pull requests.
sonic-buildimage/files
Hua Liu 05f1a5a31e
Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429)
Add watchdog mechanism to swss service and generate alert when swss have issue. 

**Work item tracking**
Microsoft ADO (number only): 16578912

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.

Manually test process_monitoring/test_critical_process_monitoring.py can pass.

Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.

Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-12 17:53:54 -07:00
..
Aboot allow docker_inram to kernel cmd list (#15374) 2023-06-10 14:19:44 +08:00
apt [Build] set apt Acquire::Retries to 3 for bullseye (#12758) 2022-11-21 08:05:16 +08:00
build/versions [ci/build]: Upgrade SONiC package versions (#15431) 2023-06-12 22:27:29 +08:00
build_scripts During build time mask only those feature/services that are disabled excplicitly (#13283) 2023-01-07 02:36:37 +00:00
build_templates enable ethernet backplane port support in port config for packet mode T2 devices (#14533) 2023-06-12 14:02:22 -07:00
dhcp ZTP infrastructure changes to support DHCP discovery provisioning data (#3298) 2019-12-10 08:16:56 -08:00
docker [dockerd] Force usage of cgo DNS resolver (#13649) 2023-02-14 08:57:19 +02:00
image_config Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933) 2023-05-30 10:16:21 -07:00
initramfs-tools [arm64][Nokia-7215-A1]Add support for Nokia-7215-A1 platform (#13795) 2023-05-18 14:24:05 -07:00
scripts Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429) 2023-06-12 17:53:54 -07:00
sshd Remove SSH host keys after installing the custom version of sshd (#10633) 2022-04-25 10:38:52 -07:00