Commit Graph

832 Commits

Author SHA1 Message Date
Vivek Reddy
e439676455
autorestart inside restapi docker is disabled (#8006)
Fix issue with critical process in the restapi docker restarting immediately after getting killed
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-07-14 11:37:00 -07:00
Guohan Lu
4f2bc1fbed Revert "Add ethtool to docker-platform-monitor (#8017)"
This reverts commit 1618aec370.
2021-07-07 23:36:44 -07:00
Stepan Blyshchak
9dd05bb1f6
[docker-teamd]: Increase teammgrd timeout to allow graceful shutdown. (#7662) (#8045)
NOTE: This is cherry-pick from 1911/2012 to master.

- Why I did it
To fix LAG IP configuration race

- How I did it
Extended timeout for teammgrd

- How to verify it
Add >80 router LAGs. Do config reload

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-07-07 14:48:29 +03:00
novikauanton
1da8c145a6
[iccpd][docker] fix initial startup configuration (#7982)
#### Why I did it
The process of config generation (sonic-cfggen) fails, but the services continue to run with invalid config

#### How I did it
* add exit with error on errors in start.sh script (because supervisord relies on start.sh return code).
* fix jinja template. Jinja use common python expressions under the hood and `has_key` method was removed from dict in py3, so use check by `in` operator as it is supported by both py2 and py3.
#### How to verify it
* compile sonic with enabled iccp. 
* add mclag config to CONFIG_DB. 
    ``` 
    'MC_LAG|1' => {
        "local_ip": "10.0.0.2",
        "peer_ip": "10.0.0.3",
        "peer_link": "Ethernet8",
        "mclag_interface": "Ethernet12" 
    }
* unmaks, enable and start swss and iccpd services in sonic.
* log in into the iccpd container and check the config file `/etc/iccpd/iccpd.conf`
* expected config:
    ```
    mclag_id:1
        local_ip:10.0.0.2
        peer_ip:10.0.0.3
        peer_link:Ethernet8
        mclag_interface:Ethernet12
    system_mac:YOUR_SYSTEM_MAC

#### Description for the changelog
Fixed initial iccpd startup configuration.
2021-07-01 00:47:26 -07:00
VenkatCisco
1618aec370
Add ethtool to docker-platform-monitor (#8017)
#### Why I did it
ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices. 

#### How I did it
add package extension to docker-platform-monitor/Dockerfile.j2
2021-06-30 09:36:47 -07:00
VenkatCisco
c5855eba08
Add libpci3 pkg to docker-platform-monitor (#8016)
#### Why I did it
The libpci library provides portable access to configuration registers of devices connected to the PCI bus.

#### How I did it
update dockers/docker-platform-monitor/Dockerfile.j2
2021-06-30 09:35:16 -07:00
thomas.cappleman@metaswitch.com
101b1fa08b
[build]: Fix sonic-cfggen contextlib err (#7996)
A recent version of contextlib2 (https://pypi.org/project/contextlib2/21.6.0/#history) has broken Python2 compatibility, so the version picked up by netaddr when using Python2 must be specified, or else builds fail

Co-authored-by: Tom Zhu <tom.zhu@metaswitch.com>
2021-06-28 17:15:03 -07:00
arlakshm
ef67ba5f6e
[multi-asic] fix network command for internal loopback (#7878)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
In the multi asic platforms all the ASIC are advertising the same IPv6 /64 network from Loopback4096.
Therefore, the IPv6 loopback address of backend asic is not learnt on the frontend asic.
Change the bgpd.conf.main.conf.j2 template file to advertise the Loopback4096 ipv6 address as /128
2021-06-24 12:02:01 -07:00
Shi Su
f52ba3b496
Remove quagga-related code (#7898)
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).

How I did it
Remove quagga-related code.
2021-06-23 09:15:56 -07:00
Qi Luo
658ed4fd37
Revert "Remove quagga related code (#7476)" (#7831)
Reverts Azure/sonic-buildimage#7476
It remove bgpd.conf.j2 and zebra.conf.j2, which is still used by sonic-config-engine unit test.
2021-06-09 18:52:45 -07:00
ngoc-do
710563f83d
[fabric] Disable unnecessary processes in swss and the orchagent-portsyncd dependency for fabric asic (#5569)
* Disable unnecessary processes in swss for fabric asic
Signed-off-by: ngocdo <ngocdo@arista.com>
2021-06-09 10:53:47 -07:00
Andriy Yurkiv
0c2521b936
Set default values only on the first start (#7735) 2021-06-09 18:39:22 +08:00
Shi Su
62a4603eef
Remove quagga related code (#7476)
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).

How I did it
Remove quagga-related code.
2021-06-07 16:44:54 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
Kwan
1347f29178
[docker-mgmt-framework]: update mgmt framework docker to support sonic-cli cmd (#6148)
- Why I did it

migrate to python3 support
add dependent packages for Klish
allow login as non-root user
- How I did it
update sonic-cli script to start Klish with user name, system name and timeout
update the Dockerfile.j2 to resolve dependent packages
add python3-dev for Klish use

- How to verify it
Incremental buster build with Azure/sonic-mgmt-framework#76 and verify the sonic-cli

- Description for the changelog
Migrate to python3.7 support, update sonic-cli script and resolve package dependencies
2021-06-02 19:38:21 -07:00
ppikh
3ad4f79fea
[sonic-mgmt docker]: Added allure-pytest library to sonic-mgmt docker container (#7665)
* Modified Dockerfile.j2 - added allure-pytest library

Signed-off-by: Petro Pikh <petrop@nvidia.com>
2021-06-02 08:42:30 -07:00
Myron Sosyak
3bf60b3db2
[docker-database] Fix Python3 issue (#7700)
#### Why I did it
To avoid the following error
```
Traceback (most recent call last):
  File "/usr/local/bin/flush_unused_database", line 10, in <module>
    if 'PONG' in output:
TypeError: a bytes-like object is required, not 'str'
```
`communicate` method returns the strings if streams were opened in text mode; otherwise, bytes.
In our case text arg  in Popen is not true and that means that `communicate` return the bytes
#### How I did it
Set `text=True` to get strings instead of bytes
#### How to verify it
run `/usr/local/bin/flush_unused_database` inside database container
2021-05-31 05:36:24 -07:00
bingwang-ms
3bb123930b
Fix lldpmgrd syntax issue (#7742)
Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-31 16:41:28 +08:00
Alexander Allen
21b9fccd75
[dockers][platform-monitor] Add chassis_db_init to platform monitor tasks (#7596)
I added `chassis_db_init` to the startup tasks for the `docker-platform-monitor` docker so that the script is run on startup of the switch and the chassis info is correctly provisioned to STATE_DB.

Depends on https://github.com/Azure/sonic-platform-daemons/pull/183
2021-05-28 12:01:03 -07:00
yozhao101
37863ac854
[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold.

How I did it
I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container.

How to verify it
I verified this implementation on device str-7260cx3-acs-1.
2021-05-28 11:13:44 -07:00
Stepan Blyshchak
d7b96dfdf1
[sonic-sdk] add sonic sdk and sonic sdk buildenv (#6712)
- Why I did it

To give SONiC Application Extension developers an environment to run and develop their apps.

- How I did it
Created sonic-sdk and sonic-sdk-buildenv dockers and their dbg versions.

- How to verify it
Build:

$ make -f slave target/sonic-sdk.gz target/sonic-sdk-buildenv.gz
2021-05-28 10:16:02 -07:00
bingwang-ms
e304182116
Fix supervisor-proc-exit-listener startup issue in restapi (#7681)
* Fix supervisor-proc-exit-listener startup issue in restapi

Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-26 18:28:10 +08:00
LuiSzee
cf83a99f45
[radv] fix bug for radv can't startup if DEVICE_METADATA.localhost.type is NULL (#7651)
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-05-25 08:17:44 -07:00
Myron Sosyak
5ab300b626
Fix python version (#7658)
#### Why I did it
To avoid the following logs 
```
Mar 15 15:52:04.599302 igk-dut-04 INFO database#/supervisord: flushdb /bin/bash: /usr/local/bin/flush_unused_database: /usr/bin/python: bad interpreter: No such file or directory
Mar 15 15:52:04.599947 igk-dut-04 INFO database#supervisord 2021-03-15 15:52:04,599 INFO exited: flushdb (exit status 126; not expected)
```

#### How I did it
Fix  shebang
#### How to verify it
Check the logs
2021-05-20 15:47:46 -07:00
xumia
9387350e19
Fix the type issue in rvtysh (#7648)
Why I did it
Change the type issue in the command rvtysh
change PARA/para to PARAM/param
2021-05-20 21:35:23 +08:00
sudhanshukumar22
f783aefd6d
docker-lldp:intermittent DB errors will result in Client termination (#6119)
This PR allows listen to hostname changes and mgmt ip changes.
2021-05-18 09:51:02 -07:00
abdosi
f27aa33e69
[muti-asic] Updated BGP community for Internal routes (#7617)
Following changes are done:

Internal routes are tagged with no-export instead of local-AS
Option to add User Define BGP community on top of no-export
2021-05-16 19:44:06 -07:00
VenkatCisco
db3d353e77
[pmon]: add psmisc to bring fuser that dentifies processes that are using files or sockets (#7509)
fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.
2021-05-06 22:24:07 -07:00
Junchao-Mellanox
a795bc0b8e
[Mellanox] Support new sensor conf file for MSN4700 A1/A0 (#7535)
#### Why I did it

MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0

#### How I did it

Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
2021-05-06 10:13:26 -07:00
trzhang-msft
4f2b54e735
dhcpmon: support dual tor in docker template (#7470) 2021-05-03 10:51:34 -07:00
Lawrence Lee
1b39424520
[docker-orchagent]: Increase ndppd kernel poll interval (#7456)
Why I did it
ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds

How I did it
Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds

How to verify it
Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-04-30 16:30:30 -07:00
Wei Bai
3967c28a76
[docker-sonic-mgmt]: Upgrade Tgen version in SONiC mgmt docker (#7472) 2021-04-29 12:31:46 -07:00
Xin Wang
a7e1f7cbad
[docker-sonic-mgmt]: Install aiohttp package to sonic-mgmt docker (#7429)
The aiohttp package is required by azure.kusto.data which is used by  sonic-mgmt/test_reporting.
This change is to ensure that the dependent package is installed in the sonic-mgmt docker.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
2021-04-26 23:38:16 -07:00
xumia
56bdd750ab
Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
2021-04-25 16:32:02 +08:00
ajbalogh
990b1127a7
[docker-sonic-mgmt] update version of ixnetwork client packages (#7242)
* Why I did it
Upgrade to the latest ixnetwork-restpy and ixnetwork-open-traffic-generator pypi packages

* How I did it
Updated the pip install entries for the packages in the Dockerfile.j2

* How to verify it
pip show ixnetwork-restpy
pip show ixnetwork-open-traffic-generator

Co-authored-by: Neetha John <nejo@microsoft.com>
2021-04-23 10:17:19 -07:00
Ze Gan
f77d719f7c
[docker-fpm-frr]: Add split mode to routing config (#7307)
For the split mode, the config files, like bgpd.conf, zebra.conf and so on, were provided by outside. But the docker_init.sh will overwrite the outside config files if restart bgp service.

How I did it
Add a split mode checking in docker_init.sh, if docker_routing_config_mode is split, don't overwrite the existing routing config files.

How to verify it
Set split mode in config db
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "docker_routing_config_mode": "split"
            ...
        }
    }
}
Replace your bgpd.conf to /etc/sonic/frr/bgpd.conf
Restart bgp service by sudo service bgp restart
The /etc/sonic/frr/bgpd.conf your provided shouldn't be overwritten

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-04-23 10:16:20 -07:00
guxianghong
6fe6d7394d
[arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-04-18 08:17:57 -07:00
jmmikkel
43342b33b8
[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622)
This commit has following changes:

* Add templates and code to support VoQ chassis iBGP peers

* Add support to convert a new VoQChassisInternal element in the
   BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR 
   table in CONFIG_DB.
* Add a new set of "voq_chassis" templates to docker-fpm-frr
* Add a new BGP peer manager to bgpcfgd to add neighbors from the
  BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates.
* Add a test case for minigraph.py, making sure the VoQChassisInternal
  element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its
  value is "false".
* Add a set of test cases for the new voq_chassis templates in
  sonic-bgpcfgd tests.

Note that the templates expect the new
"bgp bestpath peer-type multipath-relax" bgpd configuration to be
available.

Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>
2021-04-16 11:11:32 -07:00
ANISH-GOTTAPU
e858d6e346
adding snappi to docker (#7292)
For the migration of tests that involves tgen from abstract to snappi, snappi library is needed
2021-04-15 08:24:31 -07:00
judyjoseph
1ad5dbeab6
Fixes for errors seen in staging devices (#7171)
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.

admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30

In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
2021-04-08 15:16:43 -07:00
Prince Sunny
20c8dd2691
[IPinIP] Add Loopback2 interface, change dscp mode to uniform (#7234)
Co-authored-by: Ubuntu <prsunny>
2021-04-07 09:58:12 -07:00
Stephen Sun
0b16ca4ae9
[monit] Avoid monit error log by removing "-l" from monit_swss|buffermgrd (#7236)
Avoid the following error messages while dynamic buffer calculation is enabled
```
ERR monit[491]: 'swss|buffermgrd' status failed (1) -- '/usr/bin/buffermgrd -l' is not running in host
```

Change /usr/bin/buffermgrd -l to /usr/bin/buffermgrd. The buffermgrd is started by -l for traditional model or -a for dynamic model. So we need to use the common section of both.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-04-06 10:12:23 -07:00
vganesan-nokia
973affce39
[voq/inbandif] Support for inband port as regular port (#6477)
Changes in this PR are to make LLDP to consider Inband port and to avoid regular
port handling on Inband port.
2021-04-01 16:24:57 -07:00
kakkotetsu
e11397df1d
[restapi] fix python version during restapi startup (#7056)
changed from python3 to python in supervisord.conf.
2021-03-30 13:54:37 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00
Shi Su
de64c4e34c
[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169)
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
2021-03-27 11:36:56 -07:00
judyjoseph
9d9503e1fe
To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087)
Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
2021-03-17 23:14:38 -07:00
shlomibitton
43d4d45645
Backport ethtool to support QSFP-DD (#5725)
Backport ethtool debian package version 5.9 to support QSFP-DD cable parsing.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-03-16 09:56:53 -07:00
trzhang-msft
97b371ee08
[docker-dhcp-relay]: add -si support in dhcp docker template (#7053) 2021-03-15 09:21:03 -07:00
Ying Xie
070b020bc3
[sonic-mgmt docker] pin cryptography version to 3.3.2 (#7009)
Why I did it
sonic-mgmt-docker build was failing.

How I did it
pin cryptography version to 3.3.2

How to verify it
build sonic-mgmt docker.
2021-03-10 19:15:11 -08:00