SWSS config script restore ARP/FDB/Routes. Restore neighbor script
uses config DB ARP information to restore ARP entries and so needs
to be started after swssconfig exits.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
when portsyncd starts, it first enumerates all front panel ports
and marks them as old interfaces. Then, for new front panel ports
it checks if their indexes exist in previous sets. If yes, it will
treats them as old interfaces and ignore them.
The reason we have this check is because broadcom SAI only removes
front panel ports after sai switch init.
So, if portsyncd starts after orchagent, new interfaces could be
created before portsyncd and treated as old interface.
Signed-off-by: Guohan Lu <lguohan@gmail.com>
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
* Add bridge-utils to orchagent image
- Add vxlanmgrd to supervisorctl in docker -orchagent
Signed-off-by: Ze Gan zegan@microsoft.com
* Update submodule pointer for swss to include Vxlanmgrd changes
* Restore neighbor table to kernel during system warm-reboot
Added a service: "restore_neighbors" to restore neighbor table into
kernel during system warm reboot. The service is started by supervisord
in swss docker when the docker is started.
In case system warm reboot is enabled, it will try to restore the neighbor
table from appDB into kernel through netlink API calls and update the neighbor
table by sending arp/ns requests to all neighbor entries, then it sets the
stateDB flag for neighsyncd to continue the reconciliation process.
-- Added tcpdump python-scapy debian package into orchagent and vs dockers.
-- Added python module: pyroute2 netifaces into orchagent and vc dockers.
-- Workarounded tcpdump issue in the vs docker
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Move the restore_neighbors.py to sonic-swss submodule
Made changes to makefiles accordingly
Make dockerfile.j2 changes and supervisord config changes
Add python monotonic lib for time access
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
* Added PYTHON_SWSSCOMMON as swss runtime dependency
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
The default startsecs is 1 second. However, swssconfig.sh will quickly
exit with expected exit code 0 during warm starting. This case should
not be treated as a failure
* [docker-orchagent]: Add vrfmgrd to supervisorctl
Signed-off-by: Marian Pritsak <marianp@mellanox.com>
* [sonic-vs]: Add vrfmgrd to supervisorctl
Signed-off-by: Marian Pritsak <marianp@mellanox.com>
- Move front panel ports and port channels MTU and IP configurations out of
the current /etc/network/interfaces file and store them in the configuration
database.
- The default MTU value for both front panel ports and the port channels is
9100. They are set via the minigraph or 9100 by default.
- Introduce portmgrd which will pick up the MTU configurations from the
configuration database.
- The updated intfmgrd will pick up IP address changes from the configuration
database.
- Update sonic-swss submodule
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* Add support for vlanconfd and intfconfd
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Change name to vlanmgrd and intfmgrd
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Add missing vlan_members for parse_dpg result
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Remove cfgmgr debug CLI from image
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Update swss and swss-common submodules for VLAN trunk support
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
* Make supervisor controlled one-shot program autorestart 0 time, so the status will become FATAL instead of EXITED if failure happens
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Make swssconfig.sh strictly exit on any failure
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
* Tune startretries, tested in supervisor 3.3.2-1
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
- arp_update return code is not guaranteed to be true/false.
When there is no VLAN, arp_update will return true.
When there are VLANs, arp_update will return false because the
command arping returns 1 due to the option '-w 0'.
- This script should be run every 5 minutes regardless of the return
code.
- Consolidate config.sh and start.sh scripts into one script (start.sh)
- Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT
- All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files
- Supervisord log messages are now also sent to syslog
- Removed unused smartmontools package from docker-platform-monitor