sonic-buildimage

Author	SHA1	Message	Date
yozhao101	cc9c3f567e	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-28 09:28:27 -08:00
Tamer Ahmed	80ba3a4f6f	[dhcp-relay]: Launch DHCP Relay On L3 Vlan (#6527 ) Recent changes brought l2 vlan concept which do not have DHCP clients behind them and so DHCP relay is not required. Also, dhcpmon fails to launch on those vlans as their interfaces lack IP addresses. This PR limit launch of both DHCP relay and dhcpmon to L3 vlans only. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-01-28 09:21:49 -08:00
Renuka Manavalan	ba02209141	First cut image update for kubernetes support. (#5421 ) * First cut image update for kubernetes support. With this, 1) dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled for kube management init_cfg.json configure set_owner as kube for these 2) Each docker's start.sh updated to call container_startup.py to register going up As part of this call, it registers the current owner as local/kube and its version The images are built with its version ingrained into image during build 3) Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'. For all locally managed containers, it calls docker commands, hence no change for locally managed. 4) Introduced a new ctrmgrd service, that helps with transition between owners as kube & local and carry over any labels update from STATE-DB to API server 5) hostcfgd updated to handle owner change 6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image. 7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.	2020-12-22 08:01:33 -08:00
trzhang-msft	d4d90a8963	Support for dual tor option in dhcp docker template (#6152 )	2020-12-09 18:10:00 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
Tamer Ahmed	15f5d47338	[dhcpmon] Print Both Snapshot And Current Counters (#5374 ) Printing both snapshot and current counter sets will make it easier to pinpoint which message type(s) is/are not being relayed. This PR prints both counter sets. Also, this PR defines gnu11 as a C standard to compile with in order to avoid making changes when porting to 201811 branch. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-15 15:27:36 -07:00
Tamer Ahmed	1bf6fdc6d2	[dhcpmon] Monitor Mgmt Interface For DHCP Packets (#5317 ) When BGP routes are missing, DHCP packets get relayed over mgmt interface. This results in dhcpmon alerting that DHCP packets are not being relayed. This is PR include mgmt interface as uplink device, and so, if DHCP packet gets relayed over mgmt interface, regular dhcpmon alert will not be issues. Instead, dhcpmon will check the mgmt interface counts and issue a separate alert regarding packets travelling through mgmt network. In addition, this PR includes the following enhancements: 1. Add SIGUSR1 handler that prints out current packet counts 2. Increase alert grace window to 3 minutes from currently 2 minutes 3. Time is now computed more accurately 4. Print vlan name before counters signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-09 18:37:01 -07:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
Tamer Ahmed	3a10e9c6fa	[dhcp-relay] Reduce Calls to SONiC Cfggen (#5175 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when starting dhcp-relay service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:47:14 -07:00
yozhao101	4fa81b4f8d	[dockers] Update critical_processes file syntax (#4831 ) - Why I did it Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. - How to verify it We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.	2020-06-25 21:18:21 -07:00
joyas-joseph	b48d274f69	[docker-dhcp-relay]: convert dhcp-relay docker to buster (#4671 ) Upgrade isc-dhcp to 4.4.1-2 (buster version) Update libevent dependency for dhcpmon to 2.1-6 Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>	2020-06-22 15:34:21 -07:00
Guohan Lu	2e42a4ba0f	[docker-dhcp-relay]: use service dependency in supervisord to start services	2020-05-22 11:01:28 -07:00
yozhao101	91e5fb5602	[Service] Enable/disable container auto-restart based on configuration. (#4073 )	2020-02-07 12:34:07 -08:00
Dong Zhang	5057ac3122	[MultiDB] (./dockers dir) : replace redis-cli with sonic-db-cli and use new DBConnector (#3923 ) * [MultiDB] (./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * remove unnecessary quota * update typo	2020-01-22 11:27:21 -08:00
Tamer Ahmed	2658ab8add	[dhcp-relay]: Add DHCP Relay Monitor (#3886 ) DHCP relay MONitor (dhcpmon) keeps track of DORA messages. If DHCP Relay is detected to be not forwarding DORA message, dhcpmon will log such event to syslog. Under the hood dhcpmon keeps counts of clients DR messages, forwarded DR messages, DHCP server OA messages, and forwarded OA messages. dhcpmon will check every 12 sec (configurable) if counts are monotonically increasing and record snapshot of those counters. dhcpmon will report discrepancies when detected between current counters and snapshot counters. pull-request: https://github.com/Azure/sonic-buildimage/pull/3886 signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-01-07 17:48:03 -08:00
Joe LeVeque	d39f10b31f	Revert "[dhcp_relay] Add extra sleep before starting relay agent processes (#3824 )" (#3857 ) This reverts commit `7622a30d98`.	2019-12-07 20:18:49 -08:00
Joe LeVeque	7622a30d98	[dhcp_relay] Add extra sleep before starting relay agent processes (#3824 )	2019-11-26 18:16:57 -08:00
yozhao101	ed79f54569	[Services] Restart DHCP-Relay service upon unexpected critical process exit. (#3667 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-05 18:32:14 -08:00
Joe LeVeque	0e62280725	[dhcp_relay] Properly wait for routed interfaces to be ready before starting relay agent (#3441 )	2019-09-12 10:57:08 -07:00
wangshengjun	9fdc6bde8c	[dhcp_relay]:filter out the ipv6 address of dhcp server for dhcp rela… (#3397 ) * [dhcp_relay]:filter out the ipv6 address of dhcp server for dhcp relay(v4) config file. Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>	2019-09-06 12:01:08 -07:00
wangshengjun	7b0389d8a3	[dhcp_relay] Only call 'wait_until_iface_ready' once for each interface (#3317 ) Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>	2019-08-09 11:28:15 -07:00
Stepan Blyshchak	81cf33231f	[build]: Improve dockerfile instructions (#3048 ) - create a dockerfile-marcros.j2 file with all common operations written as j2 macro - use single dockerfile instruction for COPY and RUN commands when possible to improve build time - reorganize dockerfile instructions to make more cache friendly (in case someday we will remove --no-cache to build docker images) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-06-22 11:26:23 -07:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
Joe LeVeque	552684fc08	[dhcp_relay] Add support for DHCP client(s) on one VLAN and DHCP server(s) on another (#2946 )	2019-06-03 14:26:45 -07:00
Joe LeVeque	6eca27e564	[services] Restart SwSS service upon unexpected critical process exit (#2845 ) * [service] Restart SwSS Docker container if orchagent exits unexpectedly * Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes * Move supervisor-proc-exit-listener script * [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB * Ensure dependent services stop/start/restart with SwSS * Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230) * Also update journald.conf options * Remove 'PartOf' option from unit files * Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile * Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container * Update critical_processes file for swss container	2019-05-01 08:02:38 -07:00
Joe LeVeque	b186bb2c4c	[dhcp_relay] Base DHCP Relay Docker container on Debian Stretch (#2832 ) * Base DHCP relay Docker image on Strech base Docker * Change URL for isc-dhcp source repository * Upgrade isc-dhcp source branch to 4.3.5-3.1 * Update patch #0001 to apply to isc-dhcp 4.3.5-3.1 * Update patch #0002 to apply to isc-dhcp 4.3.5-3.1 * Update patch #0003 to apply to isc-dhcp 4.3.5-3.1 * Update patch #0004 to apply to isc-dhcp 4.3.5-3.1 * Remove security patches, as they are now applied as part of 4.3.5-3.1 source * Reorder patches to apply bug fix first, then features * Extend makefile to build debug Docker image * Update commit that series file applies against	2019-04-28 22:51:46 -07:00
lguohan	f3ca7c422f	[rsyslog]: use # to separate container name and program name in syslog message (#1918 ) Previously use / to separate container name and program name. However, in rsyslogd: Precisely, the programname is terminated by either (whichever occurs first): end of tag nonprintable character ‘:’ ‘[‘ ‘/’ The above definition has been taken from the FreeBSD syslogd sources. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2018-08-12 22:23:58 -07:00
Qi Luo	7ba08e5bf6	Prefix docker container name to syslog syslogtag (program name) (#1810 )	2018-06-25 10:48:42 -07:00
Joe LeVeque	6b8e340558	[dhcp_relay] Only attempt to start 'isc-dhcp-relay' group if it is not empty (#1713 )	2018-05-16 14:32:42 -07:00
Joe LeVeque	1f9107d044	[DHCP relay]: Wait for all interfaces to be assigned IPv4 addresses before starting relay agent(s) (#1173 )	2017-11-20 21:07:28 -08:00
Joe LeVeque	1d16a37d48	[DHCP Relay]: Support Multiple VLANs (Separate DHCP Relay Agents, One Per VLAN) (#999 ) * [DHCP Relay]: Support new <DhcpRelays> minigraph tag; support multiple VLANs * Don't start dhcrelay in quiet mode so as to get startup output in syslog * Update sonic-cfggen tests to support new '<DhcpRelays>' tag * <DhcpRelays> tag is only present for VLANs which require a DHCP relay agent -- only parse if present * Don't attempt to configure a DHCP relay agent for VLANs without specified DHCP servers * Modify to work with Taoyu's minigraph/DB changes (#942) * Reduce number of DHCP servers in sonic-cfggen unit tests from 4 to 2 * Remove isc-dhcp-relay sample output file from sonic-cfggen test, as we no longer generate that file * Update Option 82 isc-dhcp-relay patch to load all interface name-alias maps into memory once at start instead of calling sonic-cfggen on each packet we relay * Remove executable permission from Jinja2 template * Set max hop count to 1 so that DHCP relay will only relay packets with a hop count of zero * Replace tabs with spaces * Modify overlooked sonic-cfggen call, use Config DB instead of minigraph * Also ensure > 1 VLAN requires a DHCP relay agent before outputting to template * Generate port name-alias map file using sonic-cfggen and parse that in lieu of parsing port_config.ini directly * No longer drop packets with hop count > 0; Instead, drop packets which already contain agent info	2017-10-04 23:35:43 -07:00
Taoyu Li	c9cc7aea41	[configdb] Migrate minigraph configurations to DB (#942 ) Modify minigraph parser output format so it fit DB schema Modify configuration templates to fit new schema Systemd services dependencies are modified so database starts before any configuration consumer	2017-09-12 14:13:27 -07:00
Joe LeVeque	f49cac086f	Remove extra trailing newlines at EOF (#804 ) Files now end with a single newline	2017-07-12 20:54:37 -07:00
Joe LeVeque	3798262c1a	[DHCP Relay]: Fix Option 82 string - Remove quotes; add MAC address of receiving port as remote_id (#763 )	2017-06-27 17:59:36 -07:00
Joe LeVeque	017eea8a87	[DHCP Relay]: Add support for custom Option 82 circuit_id of the form '<hostname>:<portname>' (#747 ) * Add docker-dhcp-relay/Dockerfile to .gitignore * Add isc-dhcp-relay .deb package to image build process, along with my Option 82 patch * Install custom isc-dhcp-relay in dhcp_relay docker * Install isc-dhcp-relay build dependencies in sonic-slave Docker container * Copy the built .deb package to the destination directory * Add dependencies for isc-dhcp-relay * Change Option 82 string to '<hostname>:<portname>' * Install dependencies of .deb files implicitly in Dockerfile * Remove unused line * Remove unnecessary space	2017-06-24 12:05:04 -07:00
Joe LeVeque	e0d22acc9e	[DHCP Relay]: Wait for all interfaces to come up before starting DHCP relay (#660 )	2017-06-01 18:38:33 -07:00
Joe LeVeque	b8c11bccf2	[DHCP Relay]: Listen on all front panel, VLAN and PortChannel interfaces with IPv4 addresses (#645 ) * DHCP relay now listens on all front panel, VLAN and PortChannel interfaces with IPv4 addresses * Add sample isc-dhcp-relay output file	2017-05-30 18:29:18 -07:00
Joe LeVeque	d5c13c0a83	[dockers]: Disable autorestart on all supervisor processes inside containers (#580 )	2017-05-09 17:37:08 -07:00
Joe LeVeque	8f348399f5	[Dockers]: Manage all Docker containers with Supervisord (#573 ) - Consolidate config.sh and start.sh scripts into one script (start.sh) - Solve issue #435 - All dockers now run supervisord as their ENTRYPOINT - All stdout/stderr output from processes managed by supervisord is now sent to syslog instead of their own files - Supervisord log messages are now also sent to syslog - Removed unused smartmontools package from docker-platform-monitor	2017-05-08 15:43:31 -07:00
Taoyu Li	fed908fc6b	[config-engine] minigraph.py refactoring (#448 ) * Refactor minigraph.py See description in https://github.com/Azure/sonic-buildimage/pull/448 for detail	2017-03-30 15:25:31 -07:00
pavel-shirshov	814fd87e63	Remove /var/run/rsyslogd.pid bofore starting rsyslog (#453 )	2017-03-29 18:07:25 -07:00
Joe LeVeque	d6bfa505b3	Wait for VLAN interface to come up before starting DHCP relay (#399 )	2017-03-16 10:40:33 -07:00
Taoyu Li	bd6bf1ff9a	[config] [oneimage & dhcp relay docker] Move ntp, rsyslog, and dhcp server information into minigraph (#374 ) Move DHCP, rsyslog, and NTP server information into minigraph * Fix dhcp relay template according to CR	2017-03-06 12:41:26 -08:00
Taoyu Li	073c28bf15	Move template files to /usr/share/sonic/templates (#305 )	2017-02-18 17:50:29 -08:00
lguohan	b6753e7960	[docker-config-engine]: introduce docker sonic config engine (#274 ) * [docker-config-engine]: introduce docker sonic config engine sonic config engine provide the sonic configure engine for all sonic dockers that rely on the engine to generate runtime configuration.	2017-02-07 18:11:19 -08:00
Joe LeVeque	12fa107645	Revert "Revert "Conform with new Docker build method"" (#264 )	2017-02-06 08:40:57 -08:00
lguohan	68270f36df	[build break]: Revert "Conform with new Docker build method" (#257 )	2017-02-03 20:15:55 -08:00
Joe LeVeque	76cfd672d1	Conform with new Docker build method (#250 )	2017-02-03 14:21:57 -08:00
Joe LeVeque	b85c8dc89e	Fix isc-dhcp-relay template (#246 )	2017-02-02 21:10:07 -08:00

1 2

55 Commits