Commit Graph

934 Commits

Author SHA1 Message Date
Lawrence Lee
7bd0a2ad11
[swss]: Listen for undeliverable tunnel packets (#9348)
- Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.
- This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.
2021-12-14 14:45:23 -08:00
Shi Su
f2774b635d
Add openbfdd to ptf docker (#9488)
Why I did it
To enable test support for BFD-related features, the PTF docker needs to have the proper support for BFD. This PR aims to add BFD support in ptf docker.

How I did it
Clone and build OpenBFDD for PTF docker.

How to verify it
Build locally and verify BFD is supported.
2021-12-14 11:46:48 -08:00
abdosi
6c0da4bcf0
[bgp] Enable BGP Graceful Restart based on device role (#9486)
What I did:
Updated Jinja Template to enable BGP Graceful Restart based on device role. By default it will be enable only if the device role type is TorRouter.

Why I did:-
By default FRR is configured in Graceful Helper mode. Graceful Restart is needed on T0/TorRouter only since the device can go for warm-reboot. For T1/LeafRouter it need to be in Helper mode only
2021-12-13 10:14:50 -08:00
novikauanton
969cea07aa
add platform to iccpd's env (#8945) 2021-12-08 09:21:44 -08:00
Brian O'Connor
46bcda359c
[PINS] Build P4RT container for PINS (#9083)
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files

Submission containing materials of a third party:
    Copyright Google LLC; Licensed under Apache 2.0

#### Why I did it

Adds P4RT container to SONiC for PINS

The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md

#### How I did it

Followed the pattern and templates used for other SONiC applications

#### How to verify it

Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.

#### Which release branch to backport (provide reason below if selected)

None

#### Description for the changelog

Build P4RT container for PINS
2021-12-07 11:11:25 -08:00
abdosi
f501311f11
Updated BGP Template for Chassis/Multi-asic (#9291)
Updated BGP Template for the case:
    
   1. For Packet Chassis do not advertise Loopback4096 address into BGP as there is Static Route for same. 
       Having this route in BGP causes two level of recursion in Zebra and cause assert in Zebra 
       when there are many nexthop involved
 
   2. Advertise only P2P Connected IP's into BGP (External Peers). For Packet chassis we have backend IP Interface subnet and if 
        they get advertised into BGP then it also causes recursion
2021-12-06 09:36:24 -08:00
kellyyeh
d11207d4f4
[radv] Run radv on MgmtToRRouter (#9424)
* Allow radv to run on mgmt tor and EPMS
2021-12-03 09:45:06 -08:00
kellyyeh
f2ee94d201
[dhcp_relay] Update DHCPv6 counter on relayed messages (#9283) 2021-11-30 20:15:30 -08:00
vganesan-nokia
78de10713c
[voq-chassis][bgpcfg] VOQ_BGP_CHASSIS_NEIGHBORS timers default (#8455)
The BGP_VOQ_CHASSIS_NEIGHBOR keepalive and holdtime timers are
configured similar to general neighbors. Changes are done to configure
BGP_VOQ_CHASSIS_NEIGHBOR timers similar to BGP_INTENAL_NEIGBOR since voq
chassis bgp neighbors are similar to bgp internal neighbors in
multi-asic. As it is done for bgp internal neighbors, the keepalive and
holdtime timers are set to 3 and 10 seconds respectively. Also similar
to bgp internal neighbors, connection retry timer is also configured for
voq chassis bgp neighbors.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
2021-11-30 12:10:27 -08:00
Saikrishna Arcot
fdd8236864
[docker-mgmt-framework]: Don't overwrite /etc/passwd and /etc/group with symlinks (#9375)
Fixes #9376

Because /etc/passwd and /etc/group have been overwritten with symlinks
to /host_etc/passwd and /host_etc/group, the debug container build
fails. This is because the debug container is built without /etc being
mounted at /host_etc in the container (which does happen at runtime).
Because of that, /etc/passwd and /etc/group don't exist, which causes
some package installation errors when openssh-client tries to create a
group.

This is a partial revert of 1347f29178.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-24 23:51:59 -08:00
arlakshm
5830852832
remove staticd.conf.j2 (#9182)
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-11-24 15:32:16 -08:00
Junchao-Mellanox
554b04f312
Add trap flow counter support (#8940)
*Add trap flow counter support
2021-11-24 15:26:52 -08:00
Brian O'Connor
002827f08e
[PINS] Add APPL_STATE_DB and response path log (#9082)
- Add APPL_STATE_DB to database_config.json
- Clear APPL_STATE_DB during SwSS container restarts
- Add response path log file to logrotate config: responsepublisher.rec

Co-authored-by: PINS Working Group <sonic-pins-subgroup@googlegroups.com>
2021-11-24 10:31:06 -08:00
Stephen Sun
b3ccef9c08
[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-11-24 15:00:23 +02:00
Ze Gan
79b8ff52b0
[sonic-mgmt]: Upgrade scapy (#8554)
* Upgrade scapy

Signed-off-by: Ze Gan <ganze718@gmail.com>

* Add scapy version

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-11-24 10:07:50 +08:00
Stepan Blyshchak
a2c2d67098
[ACL] enable ACL FC when genereting config from minigraph but disable by default (#8908)
* [ACL] enable ACL FC when genereting config from minigraph but disable by default
Why I did it
To support ACL counters on Flex Counter Infrastructure.

How I did it
Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset.

How to verify it
Together with depends PRs. Run ACL/Everflow test suite.

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-11 09:07:54 +08:00
tjchadaga
8544147a70
Fix for additional intf flap during fast-reboot (#9166) 2021-11-08 15:21:11 -08:00
Lawrence Lee
7c0507b6db
[swss]: Start ndppd after vlanmgrd (#9155)
Why I did it
During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage.

How I did it
Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-03 11:03:01 -07:00
Sudharsan Dhamal Gopalarathnam
fcff3f3d09
VxLAN Tunnel Counters and Rates implementation (#8369)
* Enable flex counters for Vxlan tunnel
2021-11-01 10:42:21 -07:00
abdosi
919b3e5cdf
[chassis-packet] Fixed BGP Internal Peer template (#9106)
What I did:

Fix the typo in Internal Peer Group template for Packet-based Chassis.
Address Review comments of PR: [chassis-packet] minigraph parsing and BGP template changes #8966
- Static Route Parsing for Host
- Formatting of chassis port_config.ini
2021-10-29 11:02:38 -07:00
Junhua Zhai
7de673cb5b
[gearbox] Use separator ':' for GB_ASIC_DB, GB_COUNTERS_DB and GB_FLEX_COUNTER_DB (#9100)
Keep GB_ASIC_DB, etc consistent with the ones in sonic-swss-common/common/database_config.json
2021-10-28 10:27:52 -07:00
Marty Y. Lok
b91190d82d
[Nokia] Add protobuf and grpc C++ and python lib to support Nokia IXR7250E platform (#8366)
#### Why I did it
Nokia IXR7250E platform requires grpcio, grpcio-tools python library, and libprotobuf-dev, libgrpc++ library  

#### How I did it
Modified the build_debian.sh install libprotobuf-dev and libgrpc++ to support nokia ndk
Modified the sonic_debian_extension.j2 to install the grpcio and grpcio-tools in the host
Modified the docker-platform-monitor/Dockerfile.js to install grpcio and grpcio-tools for the pmon container.

#### How to verify it
Image running success.
2021-10-26 18:09:32 -07:00
Kebo Liu
9c4a7c2fed
[PMON] Skip chassis_db_init task on Mellanox simx platform (#9017)
Why I did it
"chassis_db_init" task of PMON should be skipped on Mellanox simx platform, since the hardware info which this task is trying to access is not available on simx platforms, It will introduce some error log.

How I did it
Add the capability for "chassis_db_init" in the template for it can be skipped by adding configuration in "pmon_daemon_control.json".
add "skip_chassis_db_init" configuration for simx platforms.
use symbol link for "pmon_daemon_control.json" since all the simx platforms share the same configuration
How to verify it
Build an image and install it on simx platform to check whether "chassis_db_init" task is skipped.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-10-24 09:10:41 -07:00
Saikrishna Arcot
c1d5e0682f
docker-dhcp-relay: Fix waiting for interfaces to get set up (#9034)
Fix the check used to wait for interfaces to come up. The group name in
the supervisor config files has changed from isc-dhcp-relay to
dhcp-relay.

Also, in the wait script, wait 10 additional seconds after the vlans,
port channels, and any interfaces are up. This is because dhcrelay
listens on all interfaces (in addition to port channels and vlans), and
to ensure that it stays in a clean state during runtime, wait some extra
time to make sure that those interfaces are created as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-10-21 18:45:00 -07:00
shlomibitton
546340bf7b
[dhcp_relay] Fix import for dhcp_counters on clear_dhcp6relay_counter.py (#8991)
#### Why I did it

**Import issue will cause:**
root@sonic:/# sudo sonic-clear arp
failed to import plugin clear.plugins.dhcprelay: No module named 'show_dhcp_relay'

#### How I did it

Fix the import.

#### How to verify it

run sudo sonic-clear arp
2021-10-19 03:10:36 -07:00
abdosi
3bb248bd67
[chassis-packet] minigraph parsing and BGP template changes (#8966)
1. Changes for Generation LC-Graph for packet-based chassis.
2. Added Support Ipv6 Peering on Loopback4096 for voq also
3. Updated asic topology yml files to be offset of slot
4. Made slot_num to take string slot<number> instead of number
5. Consolidated template_dpg_voq_asic.j2 into dpg_asic.j2
6. Remove Loopback4096 from asic topology and parse as dut invertory for
   multi-asic
7. Updated topo_facts parsing for asic topology_
8. Internal BGP Session rename from <VoqChassisInternal> to <ChassisInternal> and take switch_type as value.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-10-18 18:44:24 -07:00
Lawrence Lee
fad5ec47b4 [mux]: Call write_standby from host only
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-10-15 09:59:59 -07:00
Lawrence Lee
5232647b33 [mux]: Make write_standby available on host
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

[write_standby]: Cleanup and fix build

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-10-15 09:59:59 -07:00
Lawrence Lee
14403c61d2 [mux]: Initialize all mux ports as standby
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-10-15 09:59:59 -07:00
Tamer Ahmed
c9c2826520 Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.

Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y

Edit: linkmgrd PR will follow.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Related work items: #2315, #3146150
2021-10-15 09:59:59 -07:00
kellyyeh
df6361f50c
Change radv interval to 3min (#8882) 2021-10-01 15:00:16 -07:00
Ye Jianquan
38500fa92e
Add gdb and pyrasite to ptf image (#8816) 2021-09-24 17:10:48 +08:00
kellyyeh
62a1f5eb19
Add CLI Support for IPv6 Helpers and DHCPv6 Relay Counters (#8593) 2021-09-23 22:01:26 -07:00
DINESH KUMAR SELLAPPAN
31a647a72d
[docker-sonic-mgmt]: Snappi version to 0.5.11 (#8790) 2021-09-23 02:12:12 -07:00
kellyyeh
bc06c6fcb5
Incorporate DHCPv6 Relay Agent into dhcp-relay docker (#8321) 2021-09-22 16:05:03 -07:00
shlomibitton
112fda7877
[Flex Counters] Reset flex counters delay flag on config DB when enable_counters script is called (#8500)
#### Why I did it
Reset flex counters delay flag on config DB when enable_counters script is called to allow enablement of flex counters in orchagent.

#### How I did it
Push to config DB 'false' value for delay indication when enable_counters script is called before enabling the counters.

#### How to verify it
Observe counters are created when enable_counters script is called.
2021-09-01 21:17:36 -07:00
richardyu
479f61404b
Add thrift in the docker-sonic-mgmt (#8623)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-08-31 19:20:06 -07:00
shlomibitton
56533ceb9e
[dhcp_relay] Adapt config/show CLI commands to support DHCPv6 relay (#8211)
#### Why I did it
- Adapt config/show CLI commands to support DHCPv6 relay
- Support multiple dhcp servers assignment in one command
- Fix IP validation
- Adapt UT and add new UT cases

#### How I did it
- Modify config/show dhcp relay files
- Modify config/show UT files

#### How to verify it
This PR has a dependency on PR https://github.com/Azure/sonic-utilities/pull/1717
Build an image with the dependent PR and this PR
Use config/show DHCPv6 relay commands.
2021-08-25 00:48:39 -07:00
Christian Svensson
f7de685be2
[mgmt-framework]: Fix typo in mgmt_vars.j2 (#8475)
Signed-off-by: Christian Svensson <blue@cmd.nu>
2021-08-24 10:54:13 -07:00
Kostiantyn Yarovyi
6530f93881 [Pcied] run by python 3
Why I did it
Pcied running by python 2.

How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2

How to verify it
docker exec pmon supervisorctl status
2021-08-23 03:30:12 +00:00
Myron Sosyak
4d03526311
[docker-ptf] Upgrade to buster (#8254)
Co-authored-by: Your Name <you@example.com>
2021-08-18 10:42:03 -07:00
xumia
a4405f09ed
Support to build armhf/arm64 platforms on arm based system (#7731)
Why I did it
Support to build armhf/arm64 platforms on arm based system without qemu simulator.
When building the armhf/arm64 on arm based system, it is not necessary to use qemu simulator.

How I did it
Build armhf on armhf system, or build arm64 on arm64 system, by default, qemu simulator will not be used.
When building armhf on arm64, and you have enabled armhf docker, then it will build images without simulator automatically. It is based how the docker service is run.

Docker base image change:
For amd64, change from debian:to amd64/debian:
For arm64, change from multiarch/debian-debootstrap:arm64- to arm64v8/debian:
For armhf, change from multiarch/debian-debootstrap:armhf- to arm32v7/debian:
See https://github.com/docker-library/official-images#architectures-other-than-amd64
The mapping relations:
arm32v6 --- armel
arm32v7 --- armhf
arm64v8 --- arm64

Docker image armhf deprecated info: https://hub.docker.com/r/armhf/debian, using arm32v7 instead.
2021-08-12 22:24:37 +08:00
richardyu
9417fe9303
PTF adds unittest-xml-reporting (#8417)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-08-11 20:55:21 -07:00
Blueve
aa01315f60
[ARM] Fix issue whre the ping6 tool is missing from orchagent docker (#8345)
Signed-off-by: Jing Kan jika@microsoft.com
2021-08-05 22:00:50 +08:00
Sujin Kang
447f0c64da
[pmon]: Enable Autorestart of the daemons in PMON for unexpected exit cases (#8326)
Remove the daemon list from the critical_process which prevent the PMON
from restarting when the individual daemon crashes.
2021-08-04 09:57:54 -07:00
VenkatCisco
0803f7bf34
[pmon]: add python3-jsonschema pmon (#8018)
jsonschema is an implementation of JSON Schema for Python .

Signed-off-by: Venkat Garigipati <venkatg@cisco.com>
2021-08-03 18:08:09 -07:00
vganesan-nokia
f9231723f9
[multiasic][voq][bgpconf] Fix for the issue of same BGP router id in all asics (#8049)
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
2021-07-26 12:54:52 -07:00
Shi Su
8a48be9b74
Reduce route selection deferral timer for bgp graceful restart (#7533)
Why I did it
There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation.

Fix #7488

How I did it
Reduce route selection deferral timer for bgp graceful restart to 15 seconds.
2021-07-26 10:16:19 -07:00
賓少鈺
aa59bfeab7
[PDE]: introduce the SONiC Platform Development Env (#7510)
The PDE silicon test harness and platform test harness can be found in
src/sonic-platform-pdk-pde
2021-07-24 16:24:43 -07:00
slutati1536
de43c6a163
Added retry to sonic-mgmt docker container (#7997)
Why I did it
the motivation for this PR is to add retry_call to several test cases in the community, for example, the following cases:

test_show_platform_fanstatus_mocked
test_show_platform_temperature_mocked
are executing a command once and comparing the output to the expected mock data,
sometimes differences between the mock and the actual are causing the tests to fail.

retry will make these tests more stable.
retry will also be more efficient than sleep which will cause the tests to run longer because sometimes it is not necessary to sleep all that time, retry will only run a function only until it passed.

How I did it
added retry to the docker file

How to verify it
I run the tests with retry on the docker after installing the retry package

Signed-off-by: Sharon Lutati <slutati@nvidia.com>
2021-07-20 09:28:10 -07:00
shlomibitton
604becdd5c
[dhcp_relay] DHCP relay support for IPv6 (#7772)
Why I did it
Currently SONiC use the 'isc-dhcp-relay' package to allow DHCP relay functionality on IPv4 networks only.
This will allow the IPv6 functionality along the IPv4 type.

How I did it
Edit supervisord template to start DHCPv6 instances when configured to do so on Config DB.
Align cfg unit test to the new change.
Add DHCPv6 relay minigraph parsing support and a suitable t0 topology xml file for UT.

How to verify it
Configure DHCPv6 agents as described on the feature HLD: Azure/SONiC#765
Test it with real client/server with IPv6 or use the dedicated automatic test: Azure/sonic-mgmt#3565
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Split docker-dhcp-relay.supervisord.conf.j2 template into several files for easier code maintenance
2021-07-16 07:31:05 -07:00
Stepan Blyshchak
b3b6938fda
[dhcp-relay] make DHCP relay an extension (#6531)
- Why I did it
Make DHCP relay docker an extension. DHCP relay now carries dhcp relay commands CLI plugin and has a complete manifest.
It is installed as extension if INCLUDE_DHCP_REALY is set to y.

DEPENDS on #5939

- How I did it
Modify DHCP relay docker makefile and dockerfile. Make changes to sonic_debian_extension.j2 to install sonic packages.
I moved DHCP related CLI tests from sonic-utilities to DHCP relay docker.
This PR introduces a way to write a plugin as part of docker image and run the tests from cli-plugin-tests directory under docker directory.
The test result is available in target/docker-dhcp-relay.gz.log:

[ REASON ] :      target/docker-dhcp-relay.gz does not exist   NON-EXISTENT PREREQUISITES: docker-start target/docker-config-engine-buster.gz-load target/python-wheels/sonic_utilities-1.2-py3-none-any.whl-in
stall target/debs/buster/python3-swsscommon_1.0.0_amd64.deb-install
[ FLAGS  FILE    ] : []
[ FLAGS  DEPENDS ] : []
[ FLAGS  DIFF    ] : []
============================= test session starts ==============================
platform linux -- Python 3.7.3, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /sonic/dockers/docker-dhcp-relay/cli-plugin-tests, inifile:
plugins: cov-2.6.0
collecting ... collected 10 items

test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_plugin_registration PASSED [ 10%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_nonexist_vlanid PASSED [ 20%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_vlanid PASSED [ 30%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_ip PASSED [ 40%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_exist_ip PASSED [ 50%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_del_dhcp_relay_dest PASSED [ 60%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_nonexist_dhcp_relay_dest PASSED [ 70%]
test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_dhcp_relay_dest_with_nonexist_vlanid PASSED [ 80%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_plugin_registration PASSED [ 90%]
test_show_dhcp_relay.py::TestVlanDhcpRelay::test_dhcp_relay_column_output PASSED [100%]

=============================== warnings summary ===============================
/usr/local/lib/python3.7/dist-packages/tabulate.py:7
  /usr/local/lib/python3.7/dist-packages/tabulate.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import namedtuple, Iterable

-- Docs: https://docs.pytest.org/en/latest/warnings.html
==================== 10 passed, 1 warnings in 0.35 seconds =====================
2021-07-15 10:35:56 -07:00
Vivek Reddy
e439676455
autorestart inside restapi docker is disabled (#8006)
Fix issue with critical process in the restapi docker restarting immediately after getting killed
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-07-14 11:37:00 -07:00
Guohan Lu
4f2bc1fbed Revert "Add ethtool to docker-platform-monitor (#8017)"
This reverts commit 1618aec370.
2021-07-07 23:36:44 -07:00
Stepan Blyshchak
9dd05bb1f6
[docker-teamd]: Increase teammgrd timeout to allow graceful shutdown. (#7662) (#8045)
NOTE: This is cherry-pick from 1911/2012 to master.

- Why I did it
To fix LAG IP configuration race

- How I did it
Extended timeout for teammgrd

- How to verify it
Add >80 router LAGs. Do config reload

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-07-07 14:48:29 +03:00
novikauanton
1da8c145a6
[iccpd][docker] fix initial startup configuration (#7982)
#### Why I did it
The process of config generation (sonic-cfggen) fails, but the services continue to run with invalid config

#### How I did it
* add exit with error on errors in start.sh script (because supervisord relies on start.sh return code).
* fix jinja template. Jinja use common python expressions under the hood and `has_key` method was removed from dict in py3, so use check by `in` operator as it is supported by both py2 and py3.
#### How to verify it
* compile sonic with enabled iccp. 
* add mclag config to CONFIG_DB. 
    ``` 
    'MC_LAG|1' => {
        "local_ip": "10.0.0.2",
        "peer_ip": "10.0.0.3",
        "peer_link": "Ethernet8",
        "mclag_interface": "Ethernet12" 
    }
* unmaks, enable and start swss and iccpd services in sonic.
* log in into the iccpd container and check the config file `/etc/iccpd/iccpd.conf`
* expected config:
    ```
    mclag_id:1
        local_ip:10.0.0.2
        peer_ip:10.0.0.3
        peer_link:Ethernet8
        mclag_interface:Ethernet12
    system_mac:YOUR_SYSTEM_MAC

#### Description for the changelog
Fixed initial iccpd startup configuration.
2021-07-01 00:47:26 -07:00
VenkatCisco
1618aec370
Add ethtool to docker-platform-monitor (#8017)
#### Why I did it
ethtool can be used to query and change settings such as speed, auto- negotiation and checksum offload on many network devices, especially Ethernet devices. 

#### How I did it
add package extension to docker-platform-monitor/Dockerfile.j2
2021-06-30 09:36:47 -07:00
VenkatCisco
c5855eba08
Add libpci3 pkg to docker-platform-monitor (#8016)
#### Why I did it
The libpci library provides portable access to configuration registers of devices connected to the PCI bus.

#### How I did it
update dockers/docker-platform-monitor/Dockerfile.j2
2021-06-30 09:35:16 -07:00
thomas.cappleman@metaswitch.com
101b1fa08b
[build]: Fix sonic-cfggen contextlib err (#7996)
A recent version of contextlib2 (https://pypi.org/project/contextlib2/21.6.0/#history) has broken Python2 compatibility, so the version picked up by netaddr when using Python2 must be specified, or else builds fail

Co-authored-by: Tom Zhu <tom.zhu@metaswitch.com>
2021-06-28 17:15:03 -07:00
arlakshm
ef67ba5f6e
[multi-asic] fix network command for internal loopback (#7878)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
In the multi asic platforms all the ASIC are advertising the same IPv6 /64 network from Loopback4096.
Therefore, the IPv6 loopback address of backend asic is not learnt on the frontend asic.
Change the bgpd.conf.main.conf.j2 template file to advertise the Loopback4096 ipv6 address as /128
2021-06-24 12:02:01 -07:00
Shi Su
f52ba3b496
Remove quagga-related code (#7898)
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).

How I did it
Remove quagga-related code.
2021-06-23 09:15:56 -07:00
Qi Luo
658ed4fd37
Revert "Remove quagga related code (#7476)" (#7831)
Reverts Azure/sonic-buildimage#7476
It remove bgpd.conf.j2 and zebra.conf.j2, which is still used by sonic-config-engine unit test.
2021-06-09 18:52:45 -07:00
ngoc-do
710563f83d
[fabric] Disable unnecessary processes in swss and the orchagent-portsyncd dependency for fabric asic (#5569)
* Disable unnecessary processes in swss for fabric asic
Signed-off-by: ngocdo <ngocdo@arista.com>
2021-06-09 10:53:47 -07:00
Andriy Yurkiv
0c2521b936
Set default values only on the first start (#7735) 2021-06-09 18:39:22 +08:00
Shi Su
62a4603eef
Remove quagga related code (#7476)
Why I did it
Quagga is no longer being used. Remove quagga-related code (e.g., docker-fpm-quagga, sonic-quagga, etc.).

How I did it
Remove quagga-related code.
2021-06-07 16:44:54 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
Kwan
1347f29178
[docker-mgmt-framework]: update mgmt framework docker to support sonic-cli cmd (#6148)
- Why I did it

migrate to python3 support
add dependent packages for Klish
allow login as non-root user
- How I did it
update sonic-cli script to start Klish with user name, system name and timeout
update the Dockerfile.j2 to resolve dependent packages
add python3-dev for Klish use

- How to verify it
Incremental buster build with Azure/sonic-mgmt-framework#76 and verify the sonic-cli

- Description for the changelog
Migrate to python3.7 support, update sonic-cli script and resolve package dependencies
2021-06-02 19:38:21 -07:00
ppikh
3ad4f79fea
[sonic-mgmt docker]: Added allure-pytest library to sonic-mgmt docker container (#7665)
* Modified Dockerfile.j2 - added allure-pytest library

Signed-off-by: Petro Pikh <petrop@nvidia.com>
2021-06-02 08:42:30 -07:00
Myron Sosyak
3bf60b3db2
[docker-database] Fix Python3 issue (#7700)
#### Why I did it
To avoid the following error
```
Traceback (most recent call last):
  File "/usr/local/bin/flush_unused_database", line 10, in <module>
    if 'PONG' in output:
TypeError: a bytes-like object is required, not 'str'
```
`communicate` method returns the strings if streams were opened in text mode; otherwise, bytes.
In our case text arg  in Popen is not true and that means that `communicate` return the bytes
#### How I did it
Set `text=True` to get strings instead of bytes
#### How to verify it
run `/usr/local/bin/flush_unused_database` inside database container
2021-05-31 05:36:24 -07:00
bingwang-ms
3bb123930b
Fix lldpmgrd syntax issue (#7742)
Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-31 16:41:28 +08:00
Alexander Allen
21b9fccd75
[dockers][platform-monitor] Add chassis_db_init to platform monitor tasks (#7596)
I added `chassis_db_init` to the startup tasks for the `docker-platform-monitor` docker so that the script is run on startup of the switch and the chassis info is correctly provisioned to STATE_DB.

Depends on https://github.com/Azure/sonic-platform-daemons/pull/183
2021-05-28 12:01:03 -07:00
yozhao101
37863ac854
[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold.

How I did it
I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container.

How to verify it
I verified this implementation on device str-7260cx3-acs-1.
2021-05-28 11:13:44 -07:00
Stepan Blyshchak
d7b96dfdf1
[sonic-sdk] add sonic sdk and sonic sdk buildenv (#6712)
- Why I did it

To give SONiC Application Extension developers an environment to run and develop their apps.

- How I did it
Created sonic-sdk and sonic-sdk-buildenv dockers and their dbg versions.

- How to verify it
Build:

$ make -f slave target/sonic-sdk.gz target/sonic-sdk-buildenv.gz
2021-05-28 10:16:02 -07:00
bingwang-ms
e304182116
Fix supervisor-proc-exit-listener startup issue in restapi (#7681)
* Fix supervisor-proc-exit-listener startup issue in restapi

Signed-off-by: bingwang <bingwang@microsoft.com>
2021-05-26 18:28:10 +08:00
LuiSzee
cf83a99f45
[radv] fix bug for radv can't startup if DEVICE_METADATA.localhost.type is NULL (#7651)
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-05-25 08:17:44 -07:00
Myron Sosyak
5ab300b626
Fix python version (#7658)
#### Why I did it
To avoid the following logs 
```
Mar 15 15:52:04.599302 igk-dut-04 INFO database#/supervisord: flushdb /bin/bash: /usr/local/bin/flush_unused_database: /usr/bin/python: bad interpreter: No such file or directory
Mar 15 15:52:04.599947 igk-dut-04 INFO database#supervisord 2021-03-15 15:52:04,599 INFO exited: flushdb (exit status 126; not expected)
```

#### How I did it
Fix  shebang
#### How to verify it
Check the logs
2021-05-20 15:47:46 -07:00
xumia
9387350e19
Fix the type issue in rvtysh (#7648)
Why I did it
Change the type issue in the command rvtysh
change PARA/para to PARAM/param
2021-05-20 21:35:23 +08:00
sudhanshukumar22
f783aefd6d
docker-lldp:intermittent DB errors will result in Client termination (#6119)
This PR allows listen to hostname changes and mgmt ip changes.
2021-05-18 09:51:02 -07:00
abdosi
f27aa33e69
[muti-asic] Updated BGP community for Internal routes (#7617)
Following changes are done:

Internal routes are tagged with no-export instead of local-AS
Option to add User Define BGP community on top of no-export
2021-05-16 19:44:06 -07:00
VenkatCisco
db3d353e77
[pmon]: add psmisc to bring fuser that dentifies processes that are using files or sockets (#7509)
fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.
2021-05-06 22:24:07 -07:00
Junchao-Mellanox
a795bc0b8e
[Mellanox] Support new sensor conf file for MSN4700 A1/A0 (#7535)
#### Why I did it

MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0

#### How I did it

Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
2021-05-06 10:13:26 -07:00
trzhang-msft
4f2b54e735
dhcpmon: support dual tor in docker template (#7470) 2021-05-03 10:51:34 -07:00
Lawrence Lee
1b39424520
[docker-orchagent]: Increase ndppd kernel poll interval (#7456)
Why I did it
ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds

How I did it
Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds

How to verify it
Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-04-30 16:30:30 -07:00
Wei Bai
3967c28a76
[docker-sonic-mgmt]: Upgrade Tgen version in SONiC mgmt docker (#7472) 2021-04-29 12:31:46 -07:00
Xin Wang
a7e1f7cbad
[docker-sonic-mgmt]: Install aiohttp package to sonic-mgmt docker (#7429)
The aiohttp package is required by azure.kusto.data which is used by  sonic-mgmt/test_reporting.
This change is to ensure that the dependent package is installed in the sonic-mgmt docker.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
2021-04-26 23:38:16 -07:00
xumia
56bdd750ab
Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
2021-04-25 16:32:02 +08:00
ajbalogh
990b1127a7
[docker-sonic-mgmt] update version of ixnetwork client packages (#7242)
* Why I did it
Upgrade to the latest ixnetwork-restpy and ixnetwork-open-traffic-generator pypi packages

* How I did it
Updated the pip install entries for the packages in the Dockerfile.j2

* How to verify it
pip show ixnetwork-restpy
pip show ixnetwork-open-traffic-generator

Co-authored-by: Neetha John <nejo@microsoft.com>
2021-04-23 10:17:19 -07:00
Ze Gan
f77d719f7c
[docker-fpm-frr]: Add split mode to routing config (#7307)
For the split mode, the config files, like bgpd.conf, zebra.conf and so on, were provided by outside. But the docker_init.sh will overwrite the outside config files if restart bgp service.

How I did it
Add a split mode checking in docker_init.sh, if docker_routing_config_mode is split, don't overwrite the existing routing config files.

How to verify it
Set split mode in config db
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "docker_routing_config_mode": "split"
            ...
        }
    }
}
Replace your bgpd.conf to /etc/sonic/frr/bgpd.conf
Restart bgp service by sudo service bgp restart
The /etc/sonic/frr/bgpd.conf your provided shouldn't be overwritten

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-04-23 10:16:20 -07:00
guxianghong
6fe6d7394d
[arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-04-18 08:17:57 -07:00
jmmikkel
43342b33b8
[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622)
This commit has following changes:

* Add templates and code to support VoQ chassis iBGP peers

* Add support to convert a new VoQChassisInternal element in the
   BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR 
   table in CONFIG_DB.
* Add a new set of "voq_chassis" templates to docker-fpm-frr
* Add a new BGP peer manager to bgpcfgd to add neighbors from the
  BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates.
* Add a test case for minigraph.py, making sure the VoQChassisInternal
  element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its
  value is "false".
* Add a set of test cases for the new voq_chassis templates in
  sonic-bgpcfgd tests.

Note that the templates expect the new
"bgp bestpath peer-type multipath-relax" bgpd configuration to be
available.

Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>
2021-04-16 11:11:32 -07:00
ANISH-GOTTAPU
e858d6e346
adding snappi to docker (#7292)
For the migration of tests that involves tgen from abstract to snappi, snappi library is needed
2021-04-15 08:24:31 -07:00
judyjoseph
1ad5dbeab6
Fixes for errors seen in staging devices (#7171)
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.

admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30

In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
2021-04-08 15:16:43 -07:00
Prince Sunny
20c8dd2691
[IPinIP] Add Loopback2 interface, change dscp mode to uniform (#7234)
Co-authored-by: Ubuntu <prsunny>
2021-04-07 09:58:12 -07:00
Stephen Sun
0b16ca4ae9
[monit] Avoid monit error log by removing "-l" from monit_swss|buffermgrd (#7236)
Avoid the following error messages while dynamic buffer calculation is enabled
```
ERR monit[491]: 'swss|buffermgrd' status failed (1) -- '/usr/bin/buffermgrd -l' is not running in host
```

Change /usr/bin/buffermgrd -l to /usr/bin/buffermgrd. The buffermgrd is started by -l for traditional model or -a for dynamic model. So we need to use the common section of both.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-04-06 10:12:23 -07:00
vganesan-nokia
973affce39
[voq/inbandif] Support for inband port as regular port (#6477)
Changes in this PR are to make LLDP to consider Inband port and to avoid regular
port handling on Inband port.
2021-04-01 16:24:57 -07:00
kakkotetsu
e11397df1d
[restapi] fix python version during restapi startup (#7056)
changed from python3 to python in supervisord.conf.
2021-03-30 13:54:37 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00
Shi Su
de64c4e34c
[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169)
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
2021-03-27 11:36:56 -07:00
judyjoseph
9d9503e1fe
To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087)
Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
2021-03-17 23:14:38 -07:00
shlomibitton
43d4d45645
Backport ethtool to support QSFP-DD (#5725)
Backport ethtool debian package version 5.9 to support QSFP-DD cable parsing.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-03-16 09:56:53 -07:00