Commit Graph

8 Commits

Author SHA1 Message Date
Joe LeVeque
72b32a96fc
[201911][dockers][supervisor] Increase event buffer size for process exit listener (#7106)
Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 201911 branch.

#### Why I did it

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-29 10:07:43 -07:00
Joe LeVeque
b70c6f72b2 [dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-28 16:12:53 +00:00
Guohan Lu
eb41fd5df6 [docker-nat]: use service dependency in supervisord to start servicesx
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-15 22:20:35 -07:00
Akhilesh Samineni
dd26117bf2 [NAT]: Update the conntrack entries timeout to Max value after warmboot (#4596)
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>

All new NAT conntrack entries are added to kernel with max entry timeout of 432000 and setting the same timeout during system warm reboot also
2020-07-26 11:18:42 -07:00
yozhao101
c2364cf03e
[201911][dockers] Update critical_processes file syntax (#4854)
Backport of https://github.com/Azure/sonic-buildimage/pull/4831 to the 201911 branch
2020-06-26 11:37:05 -07:00
Dong Zhang
3faa4e936e [MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541) 2020-05-09 18:16:48 -07:00
Joe LeVeque
102cb83097 [Services] Restart NAT service upon unexpected critical process exit. (#4208) 2020-03-14 18:03:29 -07:00
Kiran Kumar Kella
a943e6ce45 Changes in sonic-buildimage to support the NAT feature (#3494)
* Changes in sonic-buildimage for the NAT feature
- Docker for NAT
- installing the required tools iptables and conntrack for nat

Signed-off-by: kiran.kella@broadcom.com

* Add redis-tools dependencies in the docker nat compilation

* Addressed review comments

* add natsyncd to warm-boot finalizer list

* addressed review comments

* using swsscommon.DBConnector instead of swsssdk.SonicV2Connector

* Enable NAT application in docker-sonic-vs
2020-02-03 15:30:39 -08:00