Commit Graph

16 Commits

Author SHA1 Message Date
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Joe LeVeque
6333bb73b0
Explicitly call pip2 rather than pip in locations where both pip2 and pip3 are installed (#5747)
As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias.

Also some other pip-related cleanup
2020-10-30 09:43:14 -07:00
shlomibitton
e66d49a57c
[LLDP] Fix for LLDP advertisements being sent with wrong information. (#5493)
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Fix comments

* Fix unit-test output caused a failure during build

* Add 'run_cmd' function and use it

* Resume lldpd even if port init timeout reached
2020-10-26 19:38:09 +02:00
Tamer Ahmed
6754635010 [cfggen] Make Jinja2 Template Python 3 Compatible
Jinja2 templates rendered using Python 3 interpreter, are required
to conform with Python 3 new semantics.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-30 07:07:43 -07:00
yozhao101
13cec4c486
[Monit] Unmonitor the processes in containers which are disabled. (#5153)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:28:28 -07:00
Joe LeVeque
12c94a7431
[lldpmgrd] Inherit DaemonBase class from sonic-py-common package (#5370)
Eliminate duplicate logging and signal handling code by inheriting from DaemonBase class in sonic-py-common package.
2020-09-15 10:55:55 -07:00
Joe LeVeque
5b3b4804ad
[dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-08 23:36:38 -07:00
yozhao101
1c32933c7d
[docker] Correct the lldp-syncd program name in critical_process file. (#4862)
The program name in critical_processes file must match the program name defined in supervisord.conf file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-06-28 11:08:30 -07:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
Joe LeVeque
9b27efdcc2
[dockers] Rename 'docker-lldp-sv2' to 'docker-lldp' (#4700)
The -sv2 suffix was used to differentiate SNMP Dockers when we transitioned from "SONiCv1" to "SONiCv2", about four years ago. The old Docker materials were removed long ago; there is no need to keep this suffix. Removing it aligns the name with all the other Dockers.
2020-06-09 09:09:56 -07:00
Shuotian Cheng
b6cc73a0ad [dockers]: Remove deprecated docker-lldp and docker-snmp (#1068)
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
2017-10-23 13:20:37 -07:00
Joe LeVeque
71d299bed4 [swsssdk]: Update nomenclature: 'sswsdk' -> 'swsssdk' (#445) 2017-03-30 11:51:05 -07:00
pavel-shirshov
a845740543 [All Dockerfiles]: Prevent apt asking questions on the console (#300)
Add noninteractive setting into every Dockerfile in the repo

Signed-off-by: Pavel Shirshov pavelsh@microsoft.com
2017-02-16 21:48:49 -08:00
thomasbo
135ba232ca SNMP/LLDP Containers: Sonic V2 Support (#41)
* Adding support for V2 in SNMP/LLDP (-sv2 postfix)
* Fixes for V1 containers: logging
* Fixes for V1 LLDP: limit LLDP to Front-panel or MGMT interfaces.
2016-10-28 15:19:29 -07:00
Qi Luo
cc7f15094c Squashed merge master 2016-09-09 17:53:41 -07:00
Qi Luo
e4bd20c18a Squash merge master (11de390) 2016-08-04 10:39:33 -07:00