Commit Graph

12 Commits

Author SHA1 Message Date
ganglv
733a902a70
Revert "[202305] Share image for gnmi and telemetry (#17137)" (#17261)
This reverts commit f2a495f7e5.
2023-11-22 23:51:34 +08:00
ganglv
f2a495f7e5
[202305] Share image for gnmi and telemetry (#17137)
Why I did it
Share docker image to support gnmi container and telemetry container
backport #16863

Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.

How to verify it
Run end to end test.
2023-11-15 11:28:21 +08:00
Zain Budhwani
09fe3f467f
Add Structured Events w/ YANG Models (#12270)
Add events for dhcp-relay, bgp, syncd, & kernel.
2022-10-09 20:23:31 -07:00
Zain Budhwani
fd6a1b0ce2
Add events to host and create rsyslog_plugin deb pkg (#12059)
Why I did it

Create rsyslog plugin deb for other containers/host to install
Add events for bgp and host events
2022-09-21 09:20:53 -07:00
Hua Liu
214e394ac0
Remove swsssdk from rules and image. (#11469)
#### Why I did it
To deprecate swsssdk, remove all dependency to it. 

#### How I did it
Remove swsssdk from rules and build image scripts.

#### How to verify it
Pass all UT and E2E test case

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Remove swsssdk from rules and build image scripts.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-08-25 08:35:51 +08:00
anamehra
f404ce60e0
container_checker on supervisor should check containers based on asic presence (#11442)
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
2022-08-22 10:08:29 -07:00
Stepan Blyshchak
2919b4820f
[hostcfgd] record feature state in STATE DB (#9842)
- Why I did it
To implement blocking feature state change.

- How I did it
Record the actual feature state in STATE DB from hostcfg.

- How to verify it
UT + verification by running on the switch and checking STATE DB.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-03-14 13:45:27 +02:00
Marty Y. Lok
c40f04f0e2
[chassis][supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9042 (#9043)
Why I did it
Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042

How I did it
Added database-chassis to the always running docker list if platform is supervisor card.

How to verify it
Execute the CLI command "sudo monit status container_checker"


Signed-off-by: mlok <marty.lok@nokia.com>
2022-03-03 17:56:08 -08:00
wenyiz2021
2d0b063191
Update container_checker for multi-asic devices when state is 'always_enabled' (#10067)
* Update container_checker for multi-asic devices 

Update container_checker for multi-asic devices to add database containers in always_running_containers. 
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.

* Update container_checker

Update an indent
2022-02-23 18:06:30 -08:00
Renuka Manavalan
7a575b3d00
[container_checker] Use Feature table to get running containers (#7474)
Why I did it
Finding running containers through "docker ps" breaks when kubernetes deploys container, as the names are mangled.

How I did it
The data is is available from FEATURE table, which takes care of kubernetes deployment too.

How to verify it
Deploy a feature via kubernetes and don't expect error from container_check.
2021-05-07 08:42:15 -07:00
yozhao101
2737c9681f
[container_checker] Exclude the 'always_disabled' container from expected running container list (#7217)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Since we introduced a new value always_disabled for the state field in FEATURE table, the expected running container list
should exclude the always_diabled containers. This bug was found by nightly test and posted at here: issue. This PR fixes #7210.

How I did it
I added a logic condition to decide whether the value of state field of a container was always_disabled or not.

How to verify it
I verified this on the device str-dx010-acs-1.

Which release branch to backport (provide reason below if selected)
 201811
 201911
 202006
[ x] 202012
2021-04-02 08:05:46 -07:00
yozhao101
04cd1d61e8
[Monit] Monitoring the running status of containers. (#6251)
**- Why I did it**
This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. 

**- How I did it**
We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog.

**- How to verify it**
I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-01-07 19:52:22 -08:00