Commit Graph

12 Commits

Author SHA1 Message Date
Joe LeVeque
b70c6f72b2 [dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-28 16:12:53 +00:00
Guohan Lu
763673993e [docker-pmon]: use service dependency in supervisord to start services 2020-08-15 22:23:50 -07:00
Junchao-Mellanox
0a70571011
[201911][thermal control] Backport feature from master branch (#4677)
Backport thermal control feature from master branch to 201911 branch by cherry-picking commits and manually resolving conflicts.
2020-06-08 11:20:43 -07:00
Nazarii Hnydyn
c266435d40
Revert "Add thermal control support for SONiC (#3949)" (#4527)
This reverts commit 109a13cc03.

Conflicts:
	dockers/docker-platform-monitor/docker-pmon.supervisord.conf.j2
2020-05-04 21:20:47 +03:00
Sujin Kang
9cbc07996e [pmon]: Fix the continous syseepromd autorestart issue on 201911 (#4478)
- Remove syseepromd from the critical process of pmon docker
- Fix supervisor autorestart configuration of syseepromd
2020-04-30 22:40:46 -07:00
Junchao-Mellanox
109a13cc03 Add thermal control support for SONiC (#3949) 2020-04-30 22:39:17 -07:00
Kebo Liu
e12d2e8bee
[PMON] Extend pmon daemon start control to lm-sensors and fancontrol for 201911 (#4487)
Extend the PMON daemon start control to lm-sensors and fancontrol.

change template docker-pmon.supervisord.conf.j2 and start.sh.j2 to have lm-sensors and fancontrol start scripts and supervisord config file controlled by pmon_daemon_control.json.

the intention is to avoid wrong daemon status in "supervisorctl status" output. For example, on some platform, if there is no fancontrol config file, and it is not ruled out from supervisord conf file and start.sh, we'll see fancontrol in "STOPPED" status from "supervisorctl status" output, which will violate some check in the platform test(check daemon status as expected)
2020-04-28 18:39:10 -07:00
Sujin Kang
c34dcbec4c [fancontrol] Restart process upon unexpected exit, not entire pmon container (#4101)
* fancontrol restart

* Cleanup the default setting for exitcodes

* Remove the unnecessary stopwaitsecs default settin
2020-04-27 06:23:19 +00:00
yozhao101
71225ea4cc [Service] Enable/disable container auto-restart based on configuration. (#4073) 2020-02-13 16:20:21 -08:00
yozhao101
4fa3a1e27e [Services] Restart Platform-monitor service upon unexpected critical process exit. (#3689)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2019-11-04 17:44:01 -08:00
Kebo Liu
8a08595006 [Pmon] Add new daemon "syseepromd" to pmon docker (#2866) 2019-06-18 11:02:24 -07:00
Kebo Liu
84b46bb0e0 [Pmon] dynamically load pmon daemons (#2654)
* dynamically load pmon daemons
2019-03-22 02:49:35 -07:00