Commit Graph

25 Commits

Author SHA1 Message Date
Joe LeVeque
b70c6f72b2 [dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-28 16:12:53 +00:00
lguohan
92270544c5 [docker-orchagent]: start portsyncd before orchagent (#4845)
when portsyncd starts, it first enumerates all front panel ports
and marks them as old interfaces. Then, for new front panel ports
it checks if their indexes exist in previous sets. If yes, it will
treats them as old interfaces and ignore them.

The reason we have this check is because broadcom SAI only removes
front panel ports after sai switch init.

So, if portsyncd starts after orchagent, new interfaces could be
created before portsyncd and treated as old interface.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-16 08:25:36 -07:00
Guohan Lu
5be374c746 [docker-orchagent]: use service dependency in supervisord to start services 2020-08-15 22:21:52 -07:00
yozhao101
71225ea4cc [Service] Enable/disable container auto-restart based on configuration. (#4073) 2020-02-13 16:20:21 -08:00
zhenggen-xu
c23aac1581 [swss] Remove "-p port_config.ini" option from the portsyncd (#3671)
* [portsyncd] Remove "-p port_config.ini" option from the portsyncd

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-10-27 21:15:39 -07:00
Joe LeVeque
6eca27e564 [services] Restart SwSS service upon unexpected critical process exit (#2845)
* [service] Restart SwSS Docker container if orchagent exits unexpectedly

* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes

* Move supervisor-proc-exit-listener script

* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB

* Ensure dependent services stop/start/restart with SwSS

* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)

* Also update journald.conf options

* Remove 'PartOf' option from unit files

* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile

* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container

* Update critical_processes file for swss container
2019-05-01 08:02:38 -07:00
Ze Gan
2e86caaedb [vxlanmgrd]: Add vxlanmgrd start command (#2705)
* Add bridge-utils to orchagent image

- Add vxlanmgrd to supervisorctl in docker -orchagent

Signed-off-by: Ze Gan zegan@microsoft.com

* Update submodule pointer for swss to include Vxlanmgrd changes
2019-04-23 20:38:08 -07:00
Marian Pritsak
178764e3fa
[swss][supervisord.conf] Remove intfsyncd 2019-01-13 16:04:39 +02:00
Prince Sunny
43f6df4654 Add nbrmgr to supervisor control (#2265)
* Add nbrmgr to supervisord conf

* Corrected priority values [Fix typo]

* Submodule update for Neighbor manager daemon

Submodule update sonic-swss-common:

edbfeec - Remove default docker name value of swss. (#250)
9728462 - Corrected configDB name for neigh table (#251)
6decc65 - Add NEIGH_TABLE to configDB for neighbor configuration (#249)
9918ae6 - Add ProducerStateTable temp view implementation and UT (#247)
41408f2 - Update README on dependencies
d9c0ba4 -Update README on the section 'Build with Google Test'
bb7fa5b - [ut]: explicit convert is to bool type (#248)
661b82c - Add gtest instruction in README

Submodule update sonic-swss

705b092 - Support ConfigDB neighbor configuration, introduce nbrmgr daemon (#693)
8522390 - Add vxlan switch attributes to switch orch (#712)
b123fa0 - [schema] update WARM_RESTART_TABLE:process_name schema document (#707)
2d7ab0c - Revert "Align default MTU value as SAI default (#705)" (#710)
836a58c - Align default MTU value as SAI default (#705)
bffa01f - VNET/VXLAN changes (#643)
b750a4b - [watermarkorch] add watermarkorch, extend queue and pg counters with wat\u2026 (#629)
2018-11-28 21:58:59 -08:00
zhenggen-xu
51a76614a3 Restore neighbor table to kernel during system warm-reboot (#2213)
* Restore neighbor table to kernel during system warm-reboot

Added a service: "restore_neighbors" to restore neighbor table into
kernel during system warm reboot. The service is started by supervisord
in swss docker when the docker is started.

In case system warm reboot is enabled, it will try to restore the neighbor
table from appDB into kernel through netlink API calls and update the neighbor
table by sending arp/ns requests to all neighbor entries, then it sets the
stateDB flag for neighsyncd to continue the reconciliation process.

-- Added tcpdump python-scapy debian package into orchagent and vs dockers.
-- Added python module: pyroute2 netifaces into orchagent and vc dockers.
-- Workarounded tcpdump issue in the vs docker

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Move the restore_neighbors.py to sonic-swss submodule
Made changes to makefiles accordingly

Make dockerfile.j2 changes and supervisord config changes

Add python monotonic lib for time access

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Added PYTHON_SWSSCOMMON as swss runtime dependency

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2018-11-09 17:06:09 -08:00
Qi Luo
709cd5a9f5
Set swssconfig.sh startsecs=0 for quick exit (#2181)
The default startsecs is 1 second. However, swssconfig.sh will quickly
exit with expected exit code 0 during warm starting. This case should
not be treated as a failure
2018-10-22 23:40:24 -07:00
Marian Pritsak
8a5e6ac47d [docker-orchagent]: Add vrfmgrd to supervisorctl (#2055)
* [docker-orchagent]: Add vrfmgrd to supervisorctl

Signed-off-by: Marian Pritsak <marianp@mellanox.com>

* [sonic-vs]: Add vrfmgrd to supervisorctl

Signed-off-by: Marian Pritsak <marianp@mellanox.com>
2018-09-19 22:18:39 -07:00
Shuotian Cheng
9413fa9a7b
[interfaces]: Move IP/MTU information from interfaces file into database (#1908)
- Move front panel ports and port channels MTU and IP configurations out of
the current /etc/network/interfaces file and store them in the configuration
database.

- The default MTU value for both front panel ports and the port channels is
9100. They are set via the minigraph or 9100 by default.

- Introduce portmgrd which will pick up the MTU configurations from the
configuration database.

- The updated intfmgrd will pick up IP address changes from the configuration
database.

- Update sonic-swss submodule

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
2018-08-20 11:19:16 -07:00
pavel-shirshov
10b4bbcae8 [swss]: Start counter from swss container (#1875)
* sonic-quagga update. Don't spam with 'Vtysh connected from' message

* Enable counters inside swss container. systemd is not flexible enough to follow our business rules
2018-07-26 13:39:08 -07:00
pavel-shirshov
c52fb762dd
Convert arp_update into a 'start-it-once' mode (#1864)
* Run arp_update just once, don't restart it. It will run continuosly with 5 min pauses
2018-07-18 13:04:57 -07:00
Andriy Moroz
58d8302b53 Buffers configuration update on port speed change (#1345)
* Move buffer configuration to ConfigDB

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Converted Dell and Arista configs

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Add buffer configs for ACS-MSN2740

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Updated buffers template

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Fixed j2 unit test

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update buffers config for Force10-S6100

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update VS docker to support speed and buffers test

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update buffers config generation

- fixed support of sonic-to-sonic install

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update submodules pointers for buffers config

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>
2018-01-29 08:11:05 -08:00
Ying Xie
2b91c9681d Revert "Buffers configuration update on port speed change (#1250)" (#1340)
This reverts commit 814e50fd5e.
2018-01-26 10:13:43 -08:00
Andriy Moroz
814e50fd5e Buffers configuration update on port speed change (#1250)
* Move buffer configuration to ConfigDB

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Converted Dell and Arista configs

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Add buffer configs for ACS-MSN2740

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Updated buffers template

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Fixed j2 unit test

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update buffers config for Force10-S6100

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update VS docker to support speed and buffers test

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Update buffers config generation

- fixed support of sonic-to-sonic install

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>
2018-01-26 08:09:31 -08:00
JipanYanga
7406d3709b [configdb]: Add support for vlanconfd and intfconfd (#1063)
* Add support for vlanconfd and intfconfd

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Change name to vlanmgrd and intfmgrd

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Add missing vlan_members for parse_dpg result

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Remove cfgmgr debug CLI from image

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>

* Update swss and swss-common submodules for VLAN trunk support

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
2017-11-05 22:37:16 -08:00
Qi Luo
554114cfaa Make swssconfig status FATAL when it fails (#1009)
* Make supervisor controlled one-shot program autorestart 0 time, so the status will become FATAL instead of EXITED if failure happens

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>

* Make swssconfig.sh strictly exit on any failure

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>

* Tune startretries, tested in supervisor 3.3.2-1

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2017-10-04 01:02:30 -07:00
Shuotian Cheng
d1b12dc0ca [swss]: Sleep 5 min regardless of arp_update return code (#743)
- arp_update return code is not guaranteed to be true/false.
  When there is no VLAN, arp_update will return true.
  When there are VLANs, arp_update will return false because the
  command arping returns 1 due to the option '-w 0'.
- This script should be run every 5 minutes regardless of the return
  code.
2017-06-22 21:26:33 -07:00
Shuotian Cheng
8af03fd0f9 [orchagent]: Add ARP update script to maintain VLAN neighbors (#401)
- Extend ARP reachable time to 30min
- Add arping to docker-swss
- Add arp_update script to routinely probe neighbors

Signed-off-by: Shuotian Cheng <shuche@microsoft.com>
2017-05-15 17:06:19 -07:00
Joe LeVeque
6e45307a49 [docker-orchagent]: Properly manage with supervisord (#589) 2017-05-11 11:18:10 -07:00
Joe LeVeque
d5c13c0a83 [dockers]: Disable autorestart on all supervisor processes inside containers (#580) 2017-05-09 17:37:08 -07:00
Joe LeVeque
8f348399f5 [Dockers]: Manage all Docker containers with Supervisord (#573)
- Consolidate config.sh and start.sh scripts into one script (start.sh)
 - Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT
 - All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files
 - Supervisord log messages are now also sent to syslog
 - Removed unused smartmontools package from docker-platform-monitor
2017-05-08 15:43:31 -07:00