sonic-buildimage/dockers/docker-fpm-frr/frr/supervisord/supervisord.conf.j2

169 lines
3.9 KiB
Plaintext
Raw Normal View History

[supervisord]
logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true
[eventlistener:dependent-startup]
command=python3 -m supervisord_dependent_startup
autostart=true
autorestart=unexpected
startretries=0
exitcodes=0,3
events=PROCESS_STATE
[dockers][supervisor] Increase event buffer size for dependent-startup (#5247) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-09 01:36:38 -05:00
buffer_size=50
[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener --container-name bgp
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected
[program:rsyslogd]
command=/usr/sbin/rsyslogd -n -iNONE
priority=1
autostart=false
autorestart=unexpected
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
[program:zebra]
command=/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
priority=4
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=rsyslogd:running
[program:staticd]
command=/usr/lib/frr/staticd -A 127.0.0.1
priority=4
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
[docker-fpm-frr]: Start bgpd after zebra was started (#5038) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure */ if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L654) and found that this function will return the exit code which the function gets from [zclient_send_message()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L266) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L269) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L277) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. **- How I did it** I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.
2020-07-25 05:48:47 -05:00
dependent_startup_wait_for=zebra:running
{% if DEVICE_METADATA.localhost.frr_mgmt_framework_config is defined and DEVICE_METADATA.localhost.frr_mgmt_framework_config == "true" %}
[program:bfdd]
command=/usr/lib/frr/bfdd -A 127.0.0.1
priority=4
stopsignal=KILL
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=zebra:running
{% endif %}
[program:bgpd]
command=/usr/bin/bgpd.sh -A 127.0.0.1 -M snmp
priority=5
stopsignal=KILL
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
[docker-fpm-frr]: Start bgpd after zebra was started (#5038) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure */ if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L654) and found that this function will return the exit code which the function gets from [zclient_send_message()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L266) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L269) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L277) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. **- How I did it** I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.
2020-07-25 05:48:47 -05:00
dependent_startup_wait_for=zebra:running
{% if DEVICE_METADATA.localhost.frr_mgmt_framework_config is defined and DEVICE_METADATA.localhost.frr_mgmt_framework_config == "true" %}
[program:ospfd]
command=/usr/lib/frr/ospfd -A 127.0.0.1 -M snmp
priority=5
stopsignal=KILL
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=zebra:running
[program:pimd]
command=/usr/lib/frr/pimd -A 127.0.0.1
priority=5
stopsignal=KILL
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=zebra:running
{% endif %}
[program:fpmsyncd]
command=fpmsyncd
priority=6
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=bgpd:running
{% if DEVICE_METADATA.localhost.frr_mgmt_framework_config is defined and DEVICE_METADATA.localhost.frr_mgmt_framework_config == "true" %}
[program:frrcfgd]
command=/usr/local/bin/frrcfgd
{% else %}
[program:bgpcfgd]
command=/usr/local/bin/bgpcfgd
{% endif %}
priority=6
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=bgpd:running
[program:bgpmon]
command=/usr/local/bin/bgpmon
priority=6
autostart=false
autorestart=true
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=bgpd:running
{% if DEVICE_METADATA.localhost.docker_routing_config_mode is defined and DEVICE_METADATA.localhost.docker_routing_config_mode == "unified" %}
[program:vtysh_b]
command=/usr/bin/vtysh -b
priority=6
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
[docker-fpm-frr]: Start bgpd after zebra was started (#5038) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure */ if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L654) and found that this function will return the exit code which the function gets from [zclient_send_message()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L266) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L269) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L277) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. **- How I did it** I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.
2020-07-25 05:48:47 -05:00
dependent_startup_wait_for=bgpd:running
{% endif %}
{% if WARM_RESTART is defined and WARM_RESTART.bgp is defined and WARM_RESTART.bgp.bgp_eoiu is defined and WARM_RESTART.bgp.bgp_eoiu == "true" %}
[program:bgp_eoiu_marker]
command=/usr/bin/bgp_eoiu_marker.py
priority=7
autostart=false
autorestart=false
startsecs=0
startretries=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
[docker-fpm-frr]: Start bgpd after zebra was started (#5038) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure */ if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L654) and found that this function will return the exit code which the function gets from [zclient_send_message()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L266) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L269) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L277) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. **- How I did it** I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.
2020-07-25 05:48:47 -05:00
dependent_startup_wait_for=bgpd:running
{% endif %}