When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting dhcp-relay
service.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.
Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.
**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
DHCP relay MONitor (dhcpmon) keeps track of DORA messages. If DHCP Relay
is detected to be not forwarding DORA message, dhcpmon will log such event
to syslog. Under the hood dhcpmon keeps counts of clients DR messages,
forwarded DR messages, DHCP server OA messages, and forwarded OA messages.
dhcpmon will check every 12 sec (configurable) if counts are monotonically
increasing and record snapshot of those counters. dhcpmon will report
discrepancies when detected between current counters and snapshot counters.
pull-request: https://github.com/Azure/sonic-buildimage/pull/3886
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
- create a dockerfile-marcros.j2 file with all common operations
written as j2 macro
- use single dockerfile instruction for COPY and RUN commands
when possible to improve build time
- reorganize dockerfile instructions to make more cache friendly
(in case someday we will remove --no-cache to build docker images)
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
* [service] Restart SwSS Docker container if orchagent exits unexpectedly
* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes
* Move supervisor-proc-exit-listener script
* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB
* Ensure dependent services stop/start/restart with SwSS
* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)
* Also update journald.conf options
* Remove 'PartOf' option from unit files
* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile
* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container
* Update critical_processes file for swss container
* Base DHCP relay Docker image on Strech base Docker
* Change URL for isc-dhcp source repository
* Upgrade isc-dhcp source branch to 4.3.5-3.1
* Update patch #0001 to apply to isc-dhcp 4.3.5-3.1
* Update patch #0002 to apply to isc-dhcp 4.3.5-3.1
* Update patch #0003 to apply to isc-dhcp 4.3.5-3.1
* Update patch #0004 to apply to isc-dhcp 4.3.5-3.1
* Remove security patches, as they are now applied as part of 4.3.5-3.1 source
* Reorder patches to apply bug fix first, then features
* Extend makefile to build debug Docker image
* Update commit that series file applies against
Previously use / to separate container name and program name.
However, in rsyslogd:
Precisely, the programname is terminated by either (whichever occurs first):
end of tag
nonprintable character
‘:’
‘[‘
‘/’
The above definition has been taken from the FreeBSD syslogd sources.
Signed-off-by: Guohan Lu <gulv@microsoft.com>
* [DHCP Relay]: Support new <DhcpRelays> minigraph tag; support multiple VLANs
* Don't start dhcrelay in quiet mode so as to get startup output in syslog
* Update sonic-cfggen tests to support new '<DhcpRelays>' tag
* <DhcpRelays> tag is only present for VLANs which require a DHCP relay agent -- only parse if present
* Don't attempt to configure a DHCP relay agent for VLANs without specified DHCP servers
* Modify to work with Taoyu's minigraph/DB changes (#942)
* Reduce number of DHCP servers in sonic-cfggen unit tests from 4 to 2
* Remove isc-dhcp-relay sample output file from sonic-cfggen test, as we no longer generate that file
* Update Option 82 isc-dhcp-relay patch to load all interface name-alias maps into memory once at start instead of calling sonic-cfggen on each packet we relay
* Remove executable permission from Jinja2 template
* Set max hop count to 1 so that DHCP relay will only relay packets with a hop count of zero
* Replace tabs with spaces
* Modify overlooked sonic-cfggen call, use Config DB instead of minigraph
* Also ensure > 1 VLAN requires a DHCP relay agent before outputting to template
* Generate port name-alias map file using sonic-cfggen and parse that in lieu of parsing port_config.ini directly
* No longer drop packets with hop count > 0; Instead, drop packets which already contain agent info
Modify minigraph parser output format so it fit DB schema
Modify configuration templates to fit new schema
Systemd services dependencies are modified so database starts before any configuration consumer
* Add docker-dhcp-relay/Dockerfile to .gitignore
* Add isc-dhcp-relay .deb package to image build process, along with my Option 82 patch
* Install custom isc-dhcp-relay in dhcp_relay docker
* Install isc-dhcp-relay build dependencies in sonic-slave Docker container
* Copy the built .deb package to the destination directory
* Add dependencies for isc-dhcp-relay
* Change Option 82 string to '<hostname>:<portname>'
* Install dependencies of .deb files implicitly in Dockerfile
* Remove unused line
* Remove unnecessary space
- Consolidate config.sh and start.sh scripts into one script (start.sh)
- Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT
- All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files
- Supervisord log messages are now also sent to syslog
- Removed unused smartmontools package from docker-platform-monitor
* [docker-config-engine]: introduce docker sonic config engine
sonic config engine provide the sonic configure engine for all sonic
dockers that rely on the engine to generate runtime configuration.