Commit Graph

10 Commits

Author SHA1 Message Date
yozhao101
13cec4c486
[Monit] Unmonitor the processes in containers which are disabled. (#5153)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:28:28 -07:00
Joe LeVeque
5b3b4804ad
[dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-08 23:36:38 -07:00
joyas-joseph
7a6fca2f98 [docker-sflow]: upgrade docker-sflow on buster (#4904) 2020-07-12 18:08:52 +00:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
padmanarayana
5cacc2004c
Add a port index mapper service for sFlow (#4794)
* Add a port index mapper service for sFlow
2020-06-23 22:23:08 -07:00
Guohan Lu
1636be4b68 [docker-sflow]: use service dependency in supervisord to start services 2020-05-22 11:01:28 -07:00
yozhao101
91e5fb5602
[Service] Enable/disable container auto-restart based on configuration. (#4073) 2020-02-07 12:34:07 -08:00
yozhao101
b7e48b422f [Services] Allow monit system tool to monitor the critical processes status running in various SONiC containers. (#3940)
* Add a monit config file for teamd container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a copy mechanism to put the monit config file in teamd container
into base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a monit config file for snmp container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a copy mechanism to put the monit config file of snmp container into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a monit config file for dhcp_relay container in the dir
base_image_files.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a copy mechanism to put the monit config file of dhcp_relay
container into base image under /etc/monit/conf.d.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a monit config file for router advertiser container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* Add a copy mechanism to put the monit config file of router advertiser
contianer into base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-Pmon] Add a monit config file for pmon container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-Pmon] Add a copy mechanism to put the monit config file into the
base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-lldp] Add a monit config file for lldp container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-lldp] Add a copy mechanism to put the monit config file into the
base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-bgp] Add a monit config file for BGP container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-bgp] Add a copy mechanism to put monit config file into the base
image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-swss] Add a monit config file for the swss container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-swss] Add a copy mechanism to put monit config file into the
base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on barefoot
platform.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on barefoot.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on broadcom.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on broadcom.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on cavium.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-centec] Add a monit config file for syncd container on centen
platform.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on centen
platform.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on marvell.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit conifg file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on
marvell-arm64.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image on marvell-arm64.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on
marvell-armhf.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on mellanox.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a monit config file for syncd container on nephos.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-sflow] Add a monit config file for sflow container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-sflow] Add a copy mechanism to put the monit conifg file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-telemetry] Add a monit config file for telemetry container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-telemetry] Add a copy mechanism to put the monit config file
into the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-database] Add a monit config file for database container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-database] Add a copy mechanism to put the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-Dhcprelay] Change a typo.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-Dhcprelay] Change the process name in monit config file to
dhcrelay.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] There is no desserve process in syncd container on
barefoot.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] There is no process desserve in syncd container on
cavium.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] There is no process named desserve in syncd on centec.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] There is no process named desserve in syncd on marvell.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Should not delete the process desserve in syncd container
on marvell.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Delete the process dsserve in syncd on marvell.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Delete the process dsserve in syncd container on
marvell-arm64.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Delete the process dsserve in syncd container on
marvell-armhf.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Delete the process dsserve in syncd container on
mellanox.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-Radv] Change the process name to radvd.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-telemetry] Correct a typo in monit_telemetry.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-teamd] Delete the monit config file for teamd.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-teamd] Delete the mechanism to copy the monit config file into
base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-dhcprelay] Delete the monit config file for dhcp_relay
container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-dhcprelay] Delete the mechanism to copy the monit config file
into the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-radv] Delete the monit config file foe radv container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-radv] Delete the mechanism to copy the monit config file into
the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-bgp] change the monit config file for BGP container such that
monit only generates alert if the process is not running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-snmp] Change the monit config file for snmp container such that
monit only generates alret if the process is not running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-pmon] Change the monit config file for pmon container such that
monit only generates alert if the processes are not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-lldp] Change the monit config file for lldp container such that
monit only generates alerts if some processes are not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-pmon] Delete the monit config file for pmon container since some
of processes are not running depended on the type of box.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-pmon] Delete the copy mechanism to copy the monit config file
into the base image.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-lldp] Change the matching name for the process lldpd.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-swss] Change the monit config file for swss container such that
monit only generates alerts if the processes are not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
barefoot such that monit only generates alerts if the process is not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Correct a typo in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
broadcom such that monit only generates alerts if the processes are not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
cavium such that monit only generates alerts if the process is not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container such
that monit only generates alerts if the process is not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
marvell such that monit only generates alerts if the process is not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
marvell-arm64 such that monit only generates alerts if the process is
not running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
marvell-armhf such that monit will generate alert if the process is not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Change the monit config file for syncd container on
mellanox such that monit only generates alerts if the process is not
running for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-sycnd] Change the monit config file for syncd container such
that monit only generates alerts if the processes are not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-sflow] Change the monit config file for sflow container such
that monit only generates alerts if the process is not running for 5
minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-telemetry] Change the monit config file for telemetry container
such that monit only generates alerts if the processes are not running
for 5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-database] Change the monit config file for database container
such that monit only generates alerts if the process is not running for
5 minutes.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-database] Use 4 spaces to replace 2 spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-bgp] Use 4 spcess to replace 2 spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-lldp] Use 4 spaces to replace 2 spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-swss] Use 4 spaces to replace 2 space in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-sflow] Use 4 spaces to replace 2 spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-snmp] Use 4 spaces to replace 2 spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-telemetry] Use 4 spaces to replace 2 spaces in monit config
file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on barefoot.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on broadcom.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on cavium.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on centec.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on marvell.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to replace 2 spaces in the monit config file
on mellanox.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-syncd] Use 4 spaces to repalce 2 spaces in the monit config file
on nephos.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [Docker-bgp] Remove the trailing extra spaces in monit config file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-01-10 16:21:02 -08:00
yozhao101
67fc68513e [Services] Restart Sflow service upon unexpected critical process exit. (#3751)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2019-11-25 13:02:00 -08:00
padmanarayana
75104bb35d [sflow]: Build infrastructure changes to support sflow docker and utilities (#3251)
Introduce a new "sflow" container (if ENABLE_SFLOW is set). The new docker will include:
hsflowd : host-sflow based daemon is the sFlow agent
psample : Built from libpsample repository. Useful in debugging sampled packets/groups.
sflowtool : Locally dump sflow samples (e.g. with a in-unit collector)

In case of SONiC-VS, enable psample & act_sample kernel modules.

VS' syncd needs iproute2=4.20.0-2~bpo9+1 & libcap2-bin=1:2.25-1 to support tc-sample

tc-syncd is provided as a convenience tool for debugging (e.g. tc-syncd filter show ...)
2019-09-14 20:27:09 -07:00