Commit Graph

3 Commits

Author SHA1 Message Date
pavel-shirshov
1eb3dfe541
[docker-teamd]: Introducing tlm_teamd: telemetry for teamd (#4824)
**- What I did**
1. Updated submodule sonic-swss to bring tlm_teamd to the buildimage.
2. Updated supervisord for the teamd
3. Updated critical process list (not sure that tlm_teamd is critical for now)

**- How to verify it**
Build an image and run. Check that tlm_teamd is running and STATE_DB has information in the LAG_INTERFACE, and :LAG_MEMBER_INTERFACE
```
admin@sonic:~$ redis-cli -n 6 hgetall 'LAG_TABLE|PortChannel16'
 1) "state"
 2) "ok"
 3) "team_device.ifinfo.dev_addr"
 4) "4c:76:25:f5:48:80"
 5) "setup.kernel_team_mode_name"
 6) "loadbalance"
 7) "team_device.ifinfo.ifindex"
 8) "6"
 9) "runner.fast_rate"
10) "false"
11) "runner.active"
12) "true"
13) "setup.pid"
14) "35"
15) "runner.fallback"
16) "false"
```

```
admin@sonic:~$ redis-cli -n 6 hgetall 'LAG_MEMBER_TABLE|PortChannel16|Ethernet16'
 1) "runner.selected"
 2) "true"
 3) "runner.aggregator.selected"
 4) "true"
 5) "runner.aggregator.id"
 6) "26"
 7) "runner.actor_lacpdu_info.state"
 8) "61"
 9) "runner.state"
10) "current"
11) "runner.actor_lacpdu_info.system"
12) "4c:76:25:f5:48:80"
13) "runner.partner_lacpdu_info.state"
14) "61"
15) "link.up"
16) "true"
17) "ifinfo.dev_addr"
18) "4c:76:25:f5:48:80"
19) "ifinfo.ifindex"
20) "26"
21) "link_watches.list.link_watch_0.up"
22) "true"
23) "runner.actor_lacpdu_info.port"
24) "17"
25) "runner.partner_lacpdu_info.port"
26) "1"
27) "runner.partner_lacpdu_info.system"
28) "52:54:00:ff:34:1b"
```
2020-06-27 01:22:23 -07:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
yozhao101
4c31ef3cd2 [Services] Restart Teamd service upon unexpected critical process exit. (#3703)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2019-11-04 17:45:41 -08:00