2017-05-08 17:43:31 -05:00
|
|
|
[supervisord]
|
|
|
|
logfile_maxbytes=1MB
|
|
|
|
logfile_backups=2
|
|
|
|
nodaemon=true
|
|
|
|
|
2020-05-15 19:09:16 -05:00
|
|
|
[eventlistener:dependent-startup]
|
2020-11-20 01:41:32 -06:00
|
|
|
command=python3 -m supervisord_dependent_startup
|
2020-05-15 19:09:16 -05:00
|
|
|
autostart=true
|
|
|
|
autorestart=unexpected
|
|
|
|
startretries=0
|
|
|
|
exitcodes=0,3
|
|
|
|
events=PROCESS_STATE
|
[dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-09 01:36:38 -05:00
|
|
|
buffer_size=100
|
2020-05-15 19:09:16 -05:00
|
|
|
|
2019-05-01 10:02:38 -05:00
|
|
|
[eventlistener:supervisor-proc-exit-listener]
|
2020-02-07 14:34:07 -06:00
|
|
|
command=/usr/bin/supervisor-proc-exit-listener --container-name swss
|
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.
Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.
- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:
The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.
If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.
- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.
Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.
- Which release branch to backport (provide reason below if selected)
201811
201911
[x ] 202006
2021-01-21 14:57:49 -06:00
|
|
|
events=PROCESS_STATE_EXITED,PROCESS_STATE_RUNNING
|
2019-05-01 10:02:38 -05:00
|
|
|
autostart=true
|
|
|
|
autorestart=unexpected
|
|
|
|
|
2017-05-08 17:43:31 -05:00
|
|
|
[program:rsyslogd]
|
2020-05-15 19:09:16 -05:00
|
|
|
command=/usr/sbin/rsyslogd -n -iNONE
|
|
|
|
priority=1
|
2017-05-08 17:43:31 -05:00
|
|
|
autostart=false
|
2019-05-01 10:02:38 -05:00
|
|
|
autorestart=unexpected
|
2017-05-08 17:43:31 -05:00
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
2017-05-08 17:43:31 -05:00
|
|
|
|
Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851)
* buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature
* scripts and configuration needed to support a second syncd docker (physyncd)
* physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device
* support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY).
HLD is located at https://github.com/Azure/SONiC/blob/b817a12fd89520d3fd26bbc5897487928e7f6de7/doc/gearbox/gearbox_mgr_design.md
**- Why I did it**
This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based
on multi-switch support in sonic-sairedis.
**- How I did it**
Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point):
https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged)
https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged)
https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems
https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib
**- How to verify it**
In a vslib build:
root@sonic:/home/admin# show gearbox interfaces status
PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin
-------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ -------
1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down
1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down
1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down
In addition, docker ps | grep phy should show a physyncd docker running.
Signed-off-by: syd.logan@broadcom.com
2020-09-25 10:32:44 -05:00
|
|
|
[program:gearsyncd]
|
|
|
|
command=/usr/bin/gearsyncd -p /usr/share/sonic/hwsku/gearbox_config.json
|
|
|
|
priority=3
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
startsecs=0
|
|
|
|
startretries=0
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=rsyslogd:running
|
|
|
|
|
2020-06-25 00:48:37 -05:00
|
|
|
[program:portsyncd]
|
|
|
|
command=/usr/bin/portsyncd
|
2017-05-08 17:43:31 -05:00
|
|
|
priority=3
|
|
|
|
autostart=false
|
2017-05-09 19:37:08 -05:00
|
|
|
autorestart=false
|
2017-05-08 17:43:31 -05:00
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=rsyslogd:running
|
2017-05-08 17:43:31 -05:00
|
|
|
|
2020-06-25 00:48:37 -05:00
|
|
|
[program:orchagent]
|
|
|
|
command=/usr/bin/orchagent.sh
|
|
|
|
priority=4
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=portsyncd:running
|
|
|
|
|
2020-09-24 16:57:42 -05:00
|
|
|
[program:swssconfig]
|
|
|
|
command=/usr/bin/swssconfig.sh
|
2020-06-25 00:48:37 -05:00
|
|
|
priority=5
|
2017-05-11 13:18:10 -05:00
|
|
|
autostart=false
|
2020-09-24 16:57:42 -05:00
|
|
|
autorestart=unexpected
|
2020-05-15 19:09:16 -05:00
|
|
|
startretries=0
|
2020-09-24 16:57:42 -05:00
|
|
|
startsecs=0
|
2017-05-11 13:18:10 -05:00
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=orchagent:running
|
2017-05-11 13:18:10 -05:00
|
|
|
|
2020-09-24 16:57:42 -05:00
|
|
|
[program:restore_neighbors]
|
|
|
|
command=/usr/bin/restore_neighbors.py
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=6
|
2017-05-11 13:18:10 -05:00
|
|
|
autostart=false
|
2020-09-24 16:57:42 -05:00
|
|
|
autorestart=false
|
2018-10-23 01:40:24 -05:00
|
|
|
startsecs=0
|
2020-09-24 16:57:42 -05:00
|
|
|
startretries=0
|
2017-05-11 13:18:10 -05:00
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
2020-09-24 16:57:42 -05:00
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2017-05-11 13:18:10 -05:00
|
|
|
|
2020-11-23 11:31:42 -06:00
|
|
|
[program:coppmgrd]
|
|
|
|
command=/usr/bin/coppmgrd
|
|
|
|
priority=6
|
|
|
|
autostart=false
|
2021-02-12 12:59:29 -06:00
|
|
|
autorestart=false
|
2020-11-23 11:31:42 -06:00
|
|
|
startretries=0
|
|
|
|
startsecs=0
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=orchagent:running
|
|
|
|
|
2020-05-15 19:09:16 -05:00
|
|
|
[program:neighsyncd]
|
|
|
|
command=/usr/bin/neighsyncd
|
2020-06-25 00:48:37 -05:00
|
|
|
priority=7
|
2020-05-15 19:09:16 -05:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2017-11-06 00:37:16 -06:00
|
|
|
|
|
|
|
[program:vlanmgrd]
|
|
|
|
command=/usr/bin/vlanmgrd
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=8
|
2017-11-06 00:37:16 -06:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2017-11-06 00:37:16 -06:00
|
|
|
|
|
|
|
[program:intfmgrd]
|
|
|
|
command=/usr/bin/intfmgrd
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=9
|
2017-11-06 00:37:16 -06:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2018-01-29 10:11:05 -06:00
|
|
|
|
2018-08-20 13:19:16 -05:00
|
|
|
[program:portmgrd]
|
|
|
|
command=/usr/bin/portmgrd
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=10
|
2018-08-20 13:19:16 -05:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2018-08-20 13:19:16 -05:00
|
|
|
|
2018-01-29 10:11:05 -06:00
|
|
|
[program:buffermgrd]
|
2020-12-13 13:35:39 -06:00
|
|
|
command=/usr/bin/buffermgrd.sh
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=11
|
2018-01-29 10:11:05 -06:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2018-09-20 00:18:39 -05:00
|
|
|
|
|
|
|
[program:vrfmgrd]
|
|
|
|
command=/usr/bin/vrfmgrd
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=13
|
2018-09-20 00:18:39 -05:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2018-11-09 19:06:09 -06:00
|
|
|
|
2018-11-28 23:58:59 -06:00
|
|
|
[program:nbrmgrd]
|
|
|
|
command=/usr/bin/nbrmgrd
|
2019-01-13 08:04:39 -06:00
|
|
|
priority=15
|
2018-11-28 23:58:59 -06:00
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2018-11-28 23:58:59 -06:00
|
|
|
|
2019-04-23 22:38:08 -05:00
|
|
|
[program:vxlanmgrd]
|
|
|
|
command=/usr/bin/vxlanmgrd
|
|
|
|
priority=16
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
2020-05-15 19:09:16 -05:00
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
|
|
|
|
2020-12-26 13:17:18 -06:00
|
|
|
[program:tunnelmgrd]
|
|
|
|
command=/usr/bin/tunnelmgrd
|
|
|
|
priority=17
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
|
|
|
|
2020-05-15 19:09:16 -05:00
|
|
|
[program:enable_counters]
|
|
|
|
command=/usr/bin/enable_counters.py
|
|
|
|
priority=12
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
2020-12-24 20:36:01 -06:00
|
|
|
|
|
|
|
[program:fdbsyncd]
|
|
|
|
command=/usr/bin/fdbsyncd
|
|
|
|
priority=17
|
|
|
|
autostart=false
|
|
|
|
autorestart=false
|
|
|
|
stdout_logfile=syslog
|
|
|
|
stderr_logfile=syslog
|
|
|
|
dependent_startup=true
|
|
|
|
dependent_startup_wait_for=swssconfig:exited
|
|
|
|
|