Commit Graph

823 Commits

Author SHA1 Message Date
Qi Luo
658ed4fd37
Revert "Remove quagga related code (#7476)" (#7831)
Reverts Azure/sonic-buildimage#7476
It remove bgpd.conf.j2 and zebra.conf.j2, which is still used by sonic-config-engine unit test.
2021-06-09 18:52:45 -07:00
ngoc-do
710563f83d
[fabric] Disable unnecessary processes in swss and the orchagent-portsyncd dependency for fabric asic (#5569)
* Disable unnecessary processes in swss for fabric asic
Signed-off-by: ngocdo <ngocdo@arista.com>
2021-06-09 10:53:47 -07:00
Andriy Yurkiv
0c2521b936
Set default values only on the first start (#7735) 2021-06-09 18:39:22 +08:00
Shi Su
62a4603eef
Remove quagga related code (#7476)
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).

How I did it
Remove quagga-related code.
2021-06-07 16:44:54 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
Kwan
1347f29178
[docker-mgmt-framework]: update mgmt framework docker to support sonic-cli cmd (#6148)
- Why I did it

migrate to python3 support
add dependent packages for Klish
allow login as non-root user
- How I did it
update sonic-cli script to start Klish with user name, system name and timeout
update the Dockerfile.j2 to resolve dependent packages
add python3-dev for Klish use

- How to verify it
Incremental buster build with Azure/sonic-mgmt-framework#76 and verify the sonic-cli

- Description for the changelog
Migrate to python3.7 support, update sonic-cli script and resolve package dependencies
2021-06-02 19:38:21 -07:00
ppikh
3ad4f79fea
[sonic-mgmt docker]: Added allure-pytest library to sonic-mgmt docker container (#7665)
* Modified Dockerfile.j2 - added allure-pytest library

Signed-off-by: Petro Pikh <petrop@nvidia.com>
2021-06-02 08:42:30 -07:00
Myron Sosyak
3bf60b3db2
[docker-database] Fix Python3 issue (#7700)
#### Why I did it
To avoid the following error
```
Traceback (most recent call last):
  File "/usr/local/bin/flush_unused_database", line 10, in <module>
    if 'PONG' in output:
TypeError: a bytes-like object is required, not 'str'
```
`communicate` method returns the strings if streams were opened in text mode; otherwise, bytes.
In our case text arg  in Popen is not true and that means that `communicate` return the bytes
#### How I did it
Set `text=True` to get strings instead of bytes
#### How to verify it
run `/usr/local/bin/flush_unused_database` inside database container
2021-05-31 05:36:24 -07:00
bingwang-ms
3bb123930b
Fix lldpmgrd syntax issue (#7742)
Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-31 16:41:28 +08:00
Alexander Allen
21b9fccd75
[dockers][platform-monitor] Add chassis_db_init to platform monitor tasks (#7596)
I added `chassis_db_init` to the startup tasks for the `docker-platform-monitor` docker so that the script is run on startup of the switch and the chassis info is correctly provisioned to STATE_DB.

Depends on https://github.com/Azure/sonic-platform-daemons/pull/183
2021-05-28 12:01:03 -07:00
yozhao101
37863ac854
[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold.

How I did it
I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container.

How to verify it
I verified this implementation on device str-7260cx3-acs-1.
2021-05-28 11:13:44 -07:00
Stepan Blyshchak
d7b96dfdf1
[sonic-sdk] add sonic sdk and sonic sdk buildenv (#6712)
- Why I did it

To give SONiC Application Extension developers an environment to run and develop their apps.

- How I did it
Created sonic-sdk and sonic-sdk-buildenv dockers and their dbg versions.

- How to verify it
Build:

$ make -f slave target/sonic-sdk.gz target/sonic-sdk-buildenv.gz
2021-05-28 10:16:02 -07:00
bingwang-ms
e304182116
Fix supervisor-proc-exit-listener startup issue in restapi (#7681)
* Fix supervisor-proc-exit-listener startup issue in restapi

Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-26 18:28:10 +08:00
LuiSzee
cf83a99f45
[radv] fix bug for radv can't startup if DEVICE_METADATA.localhost.type is NULL (#7651)
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-05-25 08:17:44 -07:00
Myron Sosyak
5ab300b626
Fix python version (#7658)
#### Why I did it
To avoid the following logs 
```
Mar 15 15:52:04.599302 igk-dut-04 INFO database#/supervisord: flushdb /bin/bash: /usr/local/bin/flush_unused_database: /usr/bin/python: bad interpreter: No such file or directory
Mar 15 15:52:04.599947 igk-dut-04 INFO database#supervisord 2021-03-15 15:52:04,599 INFO exited: flushdb (exit status 126; not expected)
```

#### How I did it
Fix  shebang
#### How to verify it
Check the logs
2021-05-20 15:47:46 -07:00
xumia
9387350e19
Fix the type issue in rvtysh (#7648)
Why I did it
Change the type issue in the command rvtysh
change PARA/para to PARAM/param
2021-05-20 21:35:23 +08:00
sudhanshukumar22
f783aefd6d
docker-lldp:intermittent DB errors will result in Client termination (#6119)
This PR allows listen to hostname changes and mgmt ip changes.
2021-05-18 09:51:02 -07:00
abdosi
f27aa33e69
[muti-asic] Updated BGP community for Internal routes (#7617)
Following changes are done:

Internal routes are tagged with no-export instead of local-AS
Option to add User Define BGP community on top of no-export
2021-05-16 19:44:06 -07:00
VenkatCisco
db3d353e77
[pmon]: add psmisc to bring fuser that dentifies processes that are using files or sockets (#7509)
fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.
2021-05-06 22:24:07 -07:00
Junchao-Mellanox
a795bc0b8e
[Mellanox] Support new sensor conf file for MSN4700 A1/A0 (#7535)
#### Why I did it

MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0

#### How I did it

Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
2021-05-06 10:13:26 -07:00
trzhang-msft
4f2b54e735
dhcpmon: support dual tor in docker template (#7470) 2021-05-03 10:51:34 -07:00
Lawrence Lee
1b39424520
[docker-orchagent]: Increase ndppd kernel poll interval (#7456)
Why I did it
ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds

How I did it
Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds

How to verify it
Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-04-30 16:30:30 -07:00
Wei Bai
3967c28a76
[docker-sonic-mgmt]: Upgrade Tgen version in SONiC mgmt docker (#7472) 2021-04-29 12:31:46 -07:00
Xin Wang
a7e1f7cbad
[docker-sonic-mgmt]: Install aiohttp package to sonic-mgmt docker (#7429)
The aiohttp package is required by azure.kusto.data which is used by  sonic-mgmt/test_reporting.
This change is to ensure that the dependent package is installed in the sonic-mgmt docker.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
2021-04-26 23:38:16 -07:00
xumia
56bdd750ab
Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
2021-04-25 16:32:02 +08:00
ajbalogh
990b1127a7
[docker-sonic-mgmt] update version of ixnetwork client packages (#7242)
* Why I did it
Upgrade to the latest ixnetwork-restpy and ixnetwork-open-traffic-generator pypi packages

* How I did it
Updated the pip install entries for the packages in the Dockerfile.j2

* How to verify it
pip show ixnetwork-restpy
pip show ixnetwork-open-traffic-generator

Co-authored-by: Neetha John <nejo@microsoft.com>
2021-04-23 10:17:19 -07:00
Ze Gan
f77d719f7c
[docker-fpm-frr]: Add split mode to routing config (#7307)
For the split mode, the config files, like bgpd.conf, zebra.conf and so on, were provided by outside. But the docker_init.sh will overwrite the outside config files if restart bgp service.

How I did it
Add a split mode checking in docker_init.sh, if docker_routing_config_mode is split, don't overwrite the existing routing config files.

How to verify it
Set split mode in config db
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "docker_routing_config_mode": "split"
            ...
        }
    }
}
Replace your bgpd.conf to /etc/sonic/frr/bgpd.conf
Restart bgp service by sudo service bgp restart
The /etc/sonic/frr/bgpd.conf your provided shouldn't be overwritten

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-04-23 10:16:20 -07:00
guxianghong
6fe6d7394d
[arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-04-18 08:17:57 -07:00
jmmikkel
43342b33b8
[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622)
This commit has following changes:

* Add templates and code to support VoQ chassis iBGP peers

* Add support to convert a new VoQChassisInternal element in the
   BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR 
   table in CONFIG_DB.
* Add a new set of "voq_chassis" templates to docker-fpm-frr
* Add a new BGP peer manager to bgpcfgd to add neighbors from the
  BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates.
* Add a test case for minigraph.py, making sure the VoQChassisInternal
  element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its
  value is "false".
* Add a set of test cases for the new voq_chassis templates in
  sonic-bgpcfgd tests.

Note that the templates expect the new
"bgp bestpath peer-type multipath-relax" bgpd configuration to be
available.

Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>
2021-04-16 11:11:32 -07:00
ANISH-GOTTAPU
e858d6e346
adding snappi to docker (#7292)
For the migration of tests that involves tgen from abstract to snappi, snappi library is needed
2021-04-15 08:24:31 -07:00
judyjoseph
1ad5dbeab6
Fixes for errors seen in staging devices (#7171)
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.

admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30

In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
2021-04-08 15:16:43 -07:00
Prince Sunny
20c8dd2691
[IPinIP] Add Loopback2 interface, change dscp mode to uniform (#7234)
Co-authored-by: Ubuntu <prsunny>
2021-04-07 09:58:12 -07:00
Stephen Sun
0b16ca4ae9
[monit] Avoid monit error log by removing "-l" from monit_swss|buffermgrd (#7236)
Avoid the following error messages while dynamic buffer calculation is enabled
```
ERR monit[491]: 'swss|buffermgrd' status failed (1) -- '/usr/bin/buffermgrd -l' is not running in host
```

Change /usr/bin/buffermgrd -l to /usr/bin/buffermgrd. The buffermgrd is started by -l for traditional model or -a for dynamic model. So we need to use the common section of both.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-04-06 10:12:23 -07:00
vganesan-nokia
973affce39
[voq/inbandif] Support for inband port as regular port (#6477)
Changes in this PR are to make LLDP to consider Inband port and to avoid regular
port handling on Inband port.
2021-04-01 16:24:57 -07:00
kakkotetsu
e11397df1d
[restapi] fix python version during restapi startup (#7056)
changed from python3 to python in supervisord.conf.
2021-03-30 13:54:37 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00
Shi Su
de64c4e34c
[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169)
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
2021-03-27 11:36:56 -07:00
judyjoseph
9d9503e1fe
To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087)
Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
2021-03-17 23:14:38 -07:00
shlomibitton
43d4d45645
Backport ethtool to support QSFP-DD (#5725)
Backport ethtool debian package version 5.9 to support QSFP-DD cable parsing.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-03-16 09:56:53 -07:00
trzhang-msft
97b371ee08
[docker-dhcp-relay]: add -si support in dhcp docker template (#7053) 2021-03-15 09:21:03 -07:00
Ying Xie
070b020bc3
[sonic-mgmt docker] pin cryptography version to 3.3.2 (#7009)
Why I did it
sonic-mgmt-docker build was failing.

How I did it
pin cryptography version to 3.3.2

How to verify it
build sonic-mgmt docker.
2021-03-10 19:15:11 -08:00
Ze Gan
5221e68b99
[docker-ptf]: Add teamd dependency to ptf (#6994)
Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-03-10 09:11:23 -08:00
Qi Luo
38d973b834
[build]: Fix get-pip 2.7 url according to upstream announcement (#6999)
ref: https://bootstrap.pypa.io/2.7/get-pip.py

The URL you are using to fetch this script has changed, and this one will no
longer work. Please use get-pip.py from the following URL instead:

    https://bootstrap.pypa.io/pip/2.7/get-pip.py
2021-03-09 18:15:16 -08:00
Tamer Ahmed
bb03e5bb37
Start DHCP Relay When Helpers IPs Are Available (#6961)
#### Why I did it

It is possible to have DHCP relay configuration with no servers/
helpers which result in DHCP container to crash. This PR fixes this
issue by not starting DHCP relay for vlans with no DHCP helpers.

resolves: #6931 
closes: #6931 
#### How I did it
Do not add program group for dhcp relay with not dhcp helpers

#### How to verify it
Unit test
2021-03-04 20:43:08 -08:00
abdosi
30b6668b7d
Changes in FRR temapltes for multi-asic (#6901)
1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic.

2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output.

3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27

4. Enhancement to add mult_asic specific bgpd template generation unit test cases.
2021-02-26 17:05:15 -08:00
abdosi
a520cecb44
[multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848)
Enable BBR config allowas-in 1 for internal peers

Why I did:
To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.
2021-02-25 23:15:02 -08:00
Ze Gan
4068944202
[MACsec]: Set MACsec feature to be auto-start (#6678)
1. Add supervisord as the entrypoint of docker-macsec
2. Add wpa_supplicant conf into docker-macsec
3. Set the macsecmgrd as the critical_process
4. Configure supervisor to monitor macsecmgrd
5. Set macsec in the features list
6. Add config variable `INCLUDE_MACSEC`
7. Add macsec.service

**- How to verify it**

Change the `/etc/sonic/config_db.json` as follow
```
{
    "PORT": {
        "Ethernet0": {
            ...
            "macsec": "test"
         }
    }
    ...
    "MACSEC_PROFILE": {
        "test": {
            "priority": 64,
            "cipher_suite": "GCM-AES-128",
            "primary_cak": "0123456789ABCDEF0123456789ABCDEF",
            "primary_ckn": "6162636465666768696A6B6C6D6E6F707172737475767778797A303132333435",
            "policy": "security"
        }
    }
}
```
To execute `sudo config reload -y`, We should find the following new items were inserted in app_db of redis
```
127.0.0.1:6379> keys *MAC*
1) "MACSEC_EGRESS_SC_TABLE:Ethernet0:72152375678227538"
2) "MACSEC_PORT_TABLE:Ethernet0"
127.0.0.1:6379> hgetall "MACSEC_EGRESS_SC_TABLE:Ethernet0:72152375678227538"
1) "ssci"
2) ""
3) "encoding_an"
4) "0"
127.0.0.1:6379> hgetall "MACSEC_PORT_TABLE:Ethernet0"
 1) "enable"
 2) "false"
 3) "cipher_suite"
 4) "GCM-AES-128"
 5) "enable_protect"
 6) "true"
 7) "enable_encrypt"
 8) "true"
 9) "enable_replay_protect"
10) "false"
11) "replay_window"
12) "0"
```

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-02-23 13:22:45 -08:00
Qi Luo
ce3b2cbfc5
[radv] Disable radv for specific deployment_id (#6830) 2021-02-20 11:01:12 -08:00
pra-moh
2e42ecb5e7
[StreamingTelemetry] add noTLS support for debug purpose (#6704)
adding noTLS mode for debugging purpose
Removing config-set for port 8080. It fails to start telemetry if docker restarts in case on noTLS mode because it expects log_level config to be present as well.
2021-02-17 17:23:00 -08:00
Andriy Yurkiv
bf83b6ca59
Enable SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS counter by default (#6444)
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2021-02-17 10:04:48 -08:00