Commit Graph

69 Commits

Author SHA1 Message Date
xumia
ad21d1ca74 Support dpkg cache for marvell-armhf (#9381)
Why I did it
Support marvell-armhf dpkg cache
2022-04-02 00:14:51 +00:00
xumia
93a22d7df5 [Build]: Fix marvell sai package version parsing issue
Fix marvell sai package version parsing issue (#10009)
2022-02-19 04:23:29 +00:00
dflynn-Nokia
33fce6afd1 [Nokia ixs7215] Platform API fixes (#9025)
* [Nokia ixs7215] Platform API fixes

This commit delivers the following fixes
    - Fix bug preventing access to second PSU eeprom
    - Fix bug preventing updates to front panel PSU status led
    - Fix SFP reset test case failure

* Fix LGTM alert
2021-11-14 15:16:14 -08:00
Stepan Blyshchak
234b5b64e4 [dockers] change RPC, DBG dockers version: put RPG, DBG sign in build metadata part of the version (#8920)
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.

- How I did it
Changed the way how to version in RPC and DBG images are constructed.

- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-02 22:54:30 -07:00
dflynn-Nokia
25b44c0ca6 [Nokia ixs7215] Support show system-health (#8771)
* [Nokia ixs7215] Support show system-health
* [Nokia ixs7215] Fix LGTM alert
2021-09-26 21:36:26 -07:00
dflynn-Nokia
35312edebc [Nokia ixs7215] Add support for SFP eeprom type_abbrv_name attribute (#8772) 2021-09-26 21:36:14 -07:00
dflynn-Nokia
e3cfc44354 [Nokia ixs7215] Miscellaneous platform API fixes (#8707)
* [Nokia ixs7215] Miscellaneous platform API fixes

This commit delivers the following fixes for the Nokia ixs7215 platform

- Fix bug in a fan API error path
- Add support for setting the fan drawer led
- Add support for getting/setting the front panel PSU status led
- Add support for getting the min/max observed temperature value

* [Nokia ixs7215] code review changes: temperature min/max values
2021-09-14 09:58:20 -07:00
carl-nokia
75ef115e8e [Nokia ixs7215] sfp get_name test case fix (#8507)
Account for sfputil_helper indexing being 0 based

Co-authored-by: Carl Keene <keene@nokia.com>
2021-09-02 15:46:12 -07:00
dflynn-Nokia
7a950cf49c [Nokia ixs7215] Add support for changing the console baud rate (#8595)
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.

A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
2021-09-02 15:37:54 -07:00
carl-nokia
aef7c85695 [Nokia ixs7215] sfputil support + component tests (#8445)
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:17:07 -07:00
dflynn-Nokia
fb072d84cb [Nokia ixs7215] Watchdog timer support (#8377) 2021-08-25 12:13:44 -07:00
dflynn-Nokia
5bf083a388
[Nokia ixs7215] Platform API 2.0 improvements (#7931)
#### Why I did it
Failures observed when running the open community platform test suite (sonic-mgmt)

#### How I did it
Call PSUBase class initializer from derived class
2021-06-21 17:58:26 -07:00
Rajkumar-Marvell
4205fabef8
[Marvell] Updated Marvell armhf sai deb version to 1.8.1 (#7892)
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-06-16 08:53:30 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
Stepan Blyshchak
cd2c86eab6
[dockers] label SONiC Docker with manifest (#5939)
Signed-off-by: Stepan Blyschak stepanb@nvidia.com

This PR is part of SONiC Application Extension

Depends on #5938

- Why I did it
To provide an infrastructure change in order to support SONiC Application Extension feature.

- How I did it
Label every installable SONiC Docker with a minimal required manifest and auto-generate packages.json file based on
installed SONiC images.

- How to verify it
Build an image, execute the following command:

admin@sonic:~$ docker inspect docker-snmp:1.0.0 | jq '.[0].Config.Labels["com.azure.sonic.manifest"]' -r | jq
Cat /var/lib/sonic-package-manager/packages.json file to verify all dockers are listed there.
2021-04-26 13:51:50 -07:00
Rajkumar-Marvell
c8e06dffa6
[marvell] Move armhf syncd build from stretch to buster. (#7366)
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-04-19 15:10:52 -07:00
guxianghong
6fe6d7394d
[arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-04-18 08:17:57 -07:00
arheneus@marvell.com
e38e374077
[marvell]: Marvell prestera kernel driver (#7066)
Build Marvell kernel driver for prestera sai sdk
Builds interrupt and dma kernel driver
Removed the older method pre-compiled kernel module debian package and its makefile
2021-03-29 15:27:01 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00
Rajkumar-Marvell
965f4901ec
[Marvell] Updated armhf SAI deb version info. (#6863)
Modified the MRVL SAI debian version format to include debian revision number. This helps in identifying the SAI deb causing any build/runtime issue.

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-02-25 09:17:45 -08:00
dflynn-Nokia
cbe7493b8e
[Nokia ixs7215] Platform API 2.0 improvements (#6787)
- Improve sonic-mgmt platform test suite pass rate
- Improve coverage of platform unit tests
- Provide platform specific reboot logic as per platform porting guide
- Fix bug due to pcie.yaml file being located in the wrong directory
2021-02-23 09:25:14 -08:00
carl-nokia
2bf1806a34
[Platform][ixs7215]: Platform API test required files with Updates and Improvements (#6738)
- Why I did it
Enable platform API tests to run successfully by providing required test infrastructure files along with supporting changes.

- How I did it
Added platform.json along with supporting changes.
- Addition of pcie.yaml supporting pcied
- Addition of Real fan drawer support vs Virtual
- Removal of python2 wheel with support in place for python3
- supporting changes platform api tests
2021-02-11 13:17:00 -08:00
Tamer Ahmed
149a68b956
[syncd-rpc] Install Libboost Atomic 1.71, Libqtcore And Libqtnetwork (#6689)
When Building syncd-rpc, libthrift has dependency on libboost-atomic1.71.0,
however the debian packager install version 1.67 instead. This PR
preinstalls libboost-atomic v 1.71 to avoid falling back to v 1.67.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-02-10 02:26:31 -08:00
lguohan
834347b8f7
[sonic-linux-kernel]: security update to kernel 4.19.152 (#6490)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-06 21:02:06 -08:00
Guohan Lu
ca0e8cbe0e [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
yozhao101
be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
dflynn-Nokia
674fac21c1
[Nokia ixs7215] Add SW assist for platform entropy & fix inband mgmt support (#6417)
- Improve random number generation during early Sonic initialization by providing SW updates to Linux entropy value.
- Improve handling of platform In-Band management port

This commit provides the following updates to the Nokia ixs7215 platform

1. The Marvell Armada-38x SOC requires SW assistance to improve the system
   entropy value available early on in the Sonic boot sequence.
2. The Nokia ixs7215 platform does not have a dedicated Out-Of-Band (OOB) mgmt
   port and thus requires additional logic to optionally support configuring
   front panel port 48 as an In-Band mgmt port. This commit provides additional
   logic to manage and maintain the operation of this In-Band mgmt port.
2021-01-12 16:59:42 -08:00
carl-nokia
380edf054d
[Platform][nokia]: python3-smbus package add with python3 and jinja fixes (#6416)
fix platform driver breakage due to python3 upgrade and fix load minigraph errors with config load_minigraph -y

**- How I did it**
added python3-smbus to the pmon docker template since the previous was python2 specific 
fixed additional "ord" python2 specific code 
fixed the jinja templates used by qos reload - the template logic required data to be parsed 

**- How to verify it**
run "show platform XXX" commands and verify output
run "sudo config load_minigraph -y" and verify configuration 
run "show interfaces XXX" and verify output 

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-12 15:05:06 -08:00
carl-nokia
f99dbff722
[Nokia]: Enable Telemetry for armhf and provide required qos files (#6364)
* [platform][Nokia]: Add buffers and qos files for config qos reload

   - providing required files

* [platform][armhf]: remove hardcoded disable for Telemetry on armhf

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-06 15:09:13 -08:00
carl-nokia
a1fe203788
[Nokia]: EEPROM platform API Python3 compliance changes (#6318)
- Why I did it
Make EEPROM platform APIs Python3 compliant in Nokia platform.

- How I did it
Handle bytearray type returned by read_eeprom and read_eeprom_bytes methods.

- How to verify it
Boot Nokia ixs7215 and verify PMON docker running and show platform syseeprom

Co-authored-by: Carl Keene <keene@nokia.com>
2020-12-30 07:34:57 -08:00
Sabareesh-Kumar-Anandan
1ebbf66db7
[marvell] update sai version to v1.7.1 (#6263)
Updated sai deb version to v1.7.1 for marvell platforms.

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
2020-12-21 15:29:45 -08:00
Sabareesh-Kumar-Anandan
3cd70b88b7
[marvell][platform] Checking file presence before access (#6208)
In marvell_et644m platform scripts, I have added a check to confirm the file availability before accessing it.

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
2020-12-15 08:43:12 -08:00
carl-nokia
2cd236cece
[nokia]: IXS7215:fix for dynamic changes reading DOM and LED (#6136)
clean up corner condition when SDK reset and SFP's move

Co-authored-by: Carl Keene <keene@nokia.com>
2020-12-11 07:35:24 -08:00
Sabareesh-Kumar-Anandan
fe524c37e7
[platform][marvell] Arm 32-bit Arch support changes (#5749)
- Added Arm 32-bit arch build fixes
- Added marvell armhf platform specific changes

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
2020-12-03 12:38:50 -08:00
Joe LeVeque
80bf8691e8
[Syncd] containers still based on Stretch must still use Python 2 (#6010)
Some syncd containers are still based on Debian Stretch, and thus do not have Python 3 available. For these containers, we must still rely on Python 2 to run supervisord_dependent_startup and supervisor-proc-exit-listener.
2020-11-23 22:35:58 -08:00
lguohan
4d3eb18ca7
[supervisord]: use abspath as supervisord entrypoint (#5995)
use abspath makes the entrypoint not affected by PATH env.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-22 21:18:44 -08:00
Joe LeVeque
7bf05f7f4f
[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546)
**- Why I did it**

We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.

**- How I did it**

- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
    - Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
2020-11-19 23:41:32 -08:00
Sabareesh-Kumar-Anandan
91f6d5b29e
[platform][marvell] Disabling Mgmt Framework for marvell-armhf (#5753)
We are facing some compilation issue while compiling mgmt framework for arm.
So, disabling Mgmt framework for marvell-armhf. We will enable it after fixing the compilation issues

Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
2020-11-19 12:36:45 -08:00
carl-nokia
0a9d7a2145
[devices]: Add support for the Nokia-7215 platform (#5827)
Platform: armhf-nokia_ixs7215_52x-r0
HwSKU: Nokia-7215
ASIC: marvell
Port Config: 48x1G + 4x10G

Co-authored-by: dflynn <dennis.flynn@nokia.com>
Co-authored-by: Carl Keene <keene@nokia.com>
2020-11-18 17:00:40 -08:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
yozhao101
13cec4c486
[Monit] Unmonitor the processes in containers which are disabled. (#5153)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:28:28 -07:00
Joe LeVeque
5b3b4804ad
[dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-08 23:36:38 -07:00
lguohan
082c26a27d
[build]: combine feature and container feature table (#5081)
1. remove container feature table
2. do not generate feature entry if the feature is not included
   in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities

* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-05 13:23:12 -07:00
lguohan
e1ac3cfc6a
[build]: wait for conflicts package to be uninstalled (#5039)
when parallel build is enabled, both docker-fpm-frr and docker-syncd-brcm
is built at the same time, docker-fpm-frr requires swss which requires to
install libsaivs-dev. docker-syncd-brcm requires syncd package which requires
to install libsaibcm-dev.

since libsaivs-dev and libsaibcm-dev install the sai header in the same
location, these two packages cannot be installed at the same time. Therefore,
we need to serialize the build between these two packages. Simply uninstall
the conflict package is not enough to solve this issue. The correct solution
is to have one package wait for another package to be uninstalled.

For example, if syncd is built first, then it will install libsaibcm-dev.
Meanwhile, if the swss build job starts and tries to install libsaivs-dev,
it will first try to query if libsaibcm-dev is installed or not. if it is
installed, then it will wait until libsaibcm-dev is uninstalled. After syncd
job is finished, it will uninstall libsaibcm-dev and swss build job will be
unblocked.

To solve this issue, _UNINSTALLS is introduced to uninstall a package that
is no longer needed and to allow blocked job to continue.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-07-27 10:46:20 -07:00
lguohan
58632e6e83 [docker-orchagent]: make build depends only on sairedis package (#4880)
make swss build depends only on libsairedis instead of syncd. This allows to build swss without depending
on vendor sai library.

Currently, libsairedis build also buils syncd which requires vendor SAI lib. This makes difficult to build
swss docker in buster while still keeping syncd docker in stretch, as swss requires libsairedis which also
build syncd and requires vendor to provide SAI for buster. As swss docker does not really contain syncd
binary, so it is not necessary to build syncd for swss docker.

* [submodule]: update sonic-sairedis

* ccbb3bc 2020-06-28 | add option to build without syncd (HEAD, origin/master, origin/HEAD) [Guohan Lu]
* 4247481 2020-06-28 | install saidiscovery into syncd package [Guohan Lu]
* 61b8e8e 2020-06-26 | Revert "sonic-sairedis: Add support to sonic-sairedis for gearbox phys (#624)" (#630) [Danny Allen]
* 85e543c 2020-06-26 | add a README to tests directory to describe how to run 'make check' (#629) [Syd Logan]
* 2772f15 2020-06-26 | sonic-sairedis: Add support to sonic-sairedis for gearbox phys (#624) [Syd Logan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-07-12 18:08:51 +00:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
yozhao101
b8ad0ed4e4
[Monit] Use the string "/usr/bin/syncd\s" to monitor the syncd process (#4706)
**- Why I did it**
After discussed with Joe, we use the string "/usr/bin/syncd\s" in Monit configuration file to monitor 
syncd process on Broadcom and Mellanox. Due to my careless, I did not find this bug during the 
previous testing. If we use the string "/usr/bin/syncd" in Monit configuration file to monitor the 
syncd process, Monit will not detect whether syncd process is running or not. 

If we ran the command  `sudo monit procmactch “/usr/bin/syncd”` on Broadcom, there will be three 
processes in syncd container which matched this "/usr/bin/syncd": `/bin/bash /usr/bin/syncd.sh
wait`, `/usr/bin/dsserve /usr/bin/syncd –diag -u -p /etc/sai.d/sai.profile` and `/usr/bin/syncd –diag -
u -p /etc/sai.d/said.profile`. Monit will select the processes with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match. 

Similarly, On Mellanox Monit will also select the process with the highest uptime (at there 
`/bin/bash /usr/bin/syncd.sh wait`) to match and did not select `/usr/bin/syncd –diag -u -p
/etc/sai.d/said.profile` to match.

That is why Monit is unable to detect whether syncd process is running or not if we use the string “/usr/bin/syncd” in Monit configuration file. If we use the string "/usr/bin/syncd\s" in Monit configuration file, Monit can filter out the process `/bin/bash /usr/bin/syncd.sh wait` and thus can correctly monitor the syncd process.

**- How I did it**

**- How to verify it**

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-06-25 17:03:14 -07:00
joyas-joseph
9505bdb910
[docker-syncd-vs]: Convert syncd-vs docker to buster (#4726)
Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>
2020-06-09 09:07:25 -07:00
Guohan Lu
918cf632f4 [docker-syncd-mrvl]: use service dependency in supervisord to start services 2020-05-22 11:01:28 -07:00
Guohan Lu
767bc5c8c0 [build]: add docker-saiserver-* as stretch docker targets
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-05-06 10:23:38 +00:00