Why I did it
dplane_fpm_nl is a new FPM implementation in FRR. The old plugin fpm will not have any new features implemented. Usage of the new plugin gives us ability to use BGP suppression feature and next hop groups in the future.
How I did it
Switch to dplane_fpm_nl zebra plugin from old fpm plugin which is not supported anymore
Remove stale patches for old fpm plugin and add similar patches for dplane_fpm_nl
How to verify it
Build and run on the switch.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Currently the config cli of dhcpv4 is may cause confusion and config of dhcpv6 is missing.
How I did it
Add dhcp_relay config cli and test cases.
config dhcp_relay ipv4 helper (add | del) <vlan_id> <helper_ip_list>
config dhcp_relay ipv6 destination (add | del) <vlan_id> <destination_ip_list>
Updated docs for it in sonic-utilities: https://github.com/sonic-net/sonic-utilities/pull/2598/files
How to verify it
Build docker-dhcp-relay.gz with and without INCLUDE_DHCP_RELAY, and check target/docker-dhcp-relay.gz.log
#### Why I did it
Improve naming convention for bgp notification events and change type of leaf for sonic-events-host mem usage from uint64 to decimal64
#### How I did it
Replace "-" with "_"
Replace uint64 with decimal64
#### How to verify it
Run yang model unit tests
#### Description for the changelog
Change YANG model leaf naming convention for bgp notification
*Critical process for database-chassis is redis-chassis but critical_process contains hard-coded
to `redis` program always. Instead using jinja2 template to render critical process list based on database docker type. redis-chassis for database-chassis docker and redis for regular database docker.
if there is no request, you need to use curl to get data from bmc, and each query needs to start a curl process. pmon is a circular query, which will pull up multiple processes in a loop, which consumes a lot. Using request does not need to pull up the process.
Avoid traceback on sonic-clear command
sonic-clear dhcp6relay_counters
Traceback (most recent call last):
File "/usr/local/bin/sonic-clear", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/clear/plugins/dhcp-relay.py", line 19, in dhcp6relay_clear_counters
counter = DHCPv6_Counter()
NameError: name 'DHCPv6_Counter' is not defined
- How I did it
Corrected the way to import using importlib
- How to verify it
Tested the sonic-clear command and verified no traceback is seen
- Why I did it
Support syslog rate limit configuration feature
- How I did it
Remove unused rsyslog.conf from containers
Modify docker startup script to generate rsyslog.conf from template files
Add metadata/init data for syslog rate limit configuration
- How to verify it
Manual test
New sonic-mgmt regression cases
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.
How I did it
add coalesce-time 10000 to the frr bgp startup config.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Why I did it
Platform interface doesn't provide all sensors and using it isn't effective
How I did it
Request sensors via http from BMC server and parse the result
How to verify it
Related daemon in pmon populates redis db, run this command to view the contents
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
Why I did it
Fixes#12575 and #12575
How I did it
In the PR sonic-net/sonic-platform-daemons#311 chassisd updates to CHASSIS_FABRIC_ASIC_INFO with the fabric asic info.
Updating the asic_status.py to read from the correct table.
How to verify it
test on chassis
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
- Why I did it
Upgrade the app-extension developer environments (sonic-sdk & sonic-sdk-bullseye) to bullseye
- How to verify it
Built an app-extension using these images and verified if it is up and running.
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Why I did it
There were some changes in apt source code in version 2.1.9.
As a result apt used in bullseye (2.2.4) is intolerant to network issues.
This was fixed in 10631550f1 Already fixed version is used in bookworm (2.5.4)
And not yet affected version is used in buster (1.8.2.3)
How I did it
Set Acquire::Retries to 3 for sonic-slave-bullseye, docker-base-bullseye and final Debian image.
Ref: https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1876035
Signed-off-by: Konstantin Vasin k.vasin@yadro.com
* [SAI PTF] SAI PTF docker support sai-ptf v2
Publish the sai-ptf docker.
Take part of the change from previous PR #11610 (already reverted as some cache issue)
Cause in #11610, added two new target in it, one is sai-ptf another one is syncd-rpc with sai-ptf v2, to make the upgrade with more clear target, use this one take the sai-ptf one.
Test one:
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=vs
NOSTRETCH=y NOJESSIE=y NOBULLSEYE=y SAITHRIFT_V2=y make target/docker-ptf-sai.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless parameters
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Update azure-pipelines-build.yml
remove a useless option
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
- Why I did it
The values for config_db "docker_routing_config_mode" are:
separated: FRR config generated from ConfigDB, each FRR daemon has its own config file
unified: FRR config generated from ConfigDB, single FRR config file
split: FRR config not generated from ConfigDB, each FRR daemon has its own config file
This commit adds:
split-unified: FRR config not generated from ConfigDB, single FRR config file
- How I did it
In docker_init.sh, when split-unified is used, the FRR configs are not generated
from ConfigDB. What's more, "service integrated-vtysh-config" is configured in vtysh.conf.
- How to verify it
FRR config not overwritten when FRR container starts.
Signed-off-by: Arnaud le Taillanter <a.letaillanter@criteo.com>
bgpd.main.conf.j2: bugfix-9739
* Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing.
How I did it
Include a conditional statement to avoid configuring bgp in FRR when 'bgp_asn' is missing or set to 'None' or 'Null'
How to verify it
Configure 'bgp_asn' as 'None', 'Null' or have it missing from configurations and verify that /etc/frr/bgpd.conf does not have invalid bgp configurations like 'router bgp None'
Description for the changelog
Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing for bugfix 9739.
Signed-off-by: cchoate54@gmail.com
Why I did it
Unify the Debian mirror sources
Make easy to upgrade to the next Debian release, not source url code change required.
Support to customize the Debian mirror sources during the build
Relative issue: #12523
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`subprocess` is used with `shell=True`, which is very dangerous for shell injection.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
#### How I did it
remove `shell=True`, use `shell=False`
Replace `os` by `subprocess`
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
Why I did it
closes#12343
Today in SONiC the notify-keyspace-events is from DbInterface class when application try do any configdb set.
In Chassis the chassis_db may not get any configdb set operations, so there is chance this configuration will never be set.
So the chassis_db updates from one line card will not be propogated to other linecards, which are doing a psubscribe to get these event.
How I did it
update the redis.conf to set notify-keyspace-events AKE so that the notify-keyspace-events are set when the redis instance is started
How to verify it
Test on chassis
Why I did it
Add the missing debian source bullseye-updates/buster-updates
The build failure as below, it is caused by the docker image debian:bullseye used the version 2.31-13+deb11u5, but the version only available in bullseye-update.
- Skip the interface status check if the interface does not exist. In the future, when the interface is created/comes up this check will be triggered again.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Why I did it
The Dockerfile of docker-sonic-mgmt became a little bit messy over time. Some packages are also a little bit too old. It would be better to do some cleanup and upgrade some important packages.
How I did it
Updated the dockerfile template for building docker-sonic-mgmt.
How to verify it
Locally built the docker-sonic-mgmt image and used it to run some test scripts.
Description for the changelog:
The build-essential package contains gcc and make. It's unnecessary to install them again.
The python-is-python2 package is included in the python package for Ubuntu 20.04. It's unnecessary to install it again.
Sort the apt and pip packages by alphabetic order.
Cleanup get-pip.py after installation.
Cleanup the python-scapy deb package after installation.
Ensure that the python pip, setuptools and wheel packages are up to date.
Install pytest-ansible from pip instead of from source code.
While installing docker-ce-cli, it's unnecessary to install curl and software-properties-common again.
Merged some pip install steps into one step.
Upgrade ansible from 2.8.12 to 2.9.27 for env-python3.
Upgrade pytest to 7.1.3 for env-python3.
Add ncclient package to evn-python3.
* Add smartmontools to pmon docker
* Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files
* Add comments on smartmontools version for both host and pmon
- Why I did it
Fixes#11431
- How I did it
dhcp6relay binds to ipv6 addresses configured on these vlan interfaces
Thus check if they are ready before launching dhcp6relay
- How to verify it
Unit Tests
Tested on a live device
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
The python packages azure-kusto-data and azure-kusto-ingest packages for python2 are too old and not really used. The python3 environment has newer version of these packages installed. This change is to deprecate these two packages for python2 in docker-sonic-mgmt image.
How I did it
Removed the lines for installing old version of packages azure-kusto-data and azure-kusto-ingest in python2 in the Dockerfile template.
Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Why I did it
test_sai_qos failed because of the following error:
"stderr_lines": [
"Traceback (most recent call last):",
" File \"/usr/bin/ptf\", line 522, in <module>",
" test_modules = load_test_modules(config)",
" File \"/usr/bin/ptf\", line 413, in load_test_modules",
" mod = imp.load_module(modname, *imp.find_module(modname, [root]))",
" File \"saitests/switch.py\", line 19, in <module>",
" import switch_sai_thrift",
"ImportError: No module named switch_sai_thrift"
],
It's because test_sai_qos runs ptf script which imports switch_sai_thrift, switch_sai_thrift is installed from python-saithrift_0.9.4_amd64.deb.
For master image, the deb file is for python3, but ptf only has virtual python3 environment, that's why we add --system-site-packages to allow virtual env to access system site-packeges.
Add thrift package in docker ptf virtual python3 env, because currently env-python3 doesn't have thrift module which is needed in switch_sai_thrift.
How I did it
Enable --system-site-packages for virtual py3 env in ptf docker and install thrift for test_qos_sai
How to verify it
load and login ptf conatiner
dpkg - i python-saithrift_0.9.4_amd64.deb
source /root/env-python3/bin/activate
python
import switch_sai_thrift.switch_sai_rpc
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
Why I did it
In latest syncd container, it is installed bullseye, can't find command '/usr/bin/python'.
Some scripts such as test_copp still calls /usr/bin/python in syncd.
Submitted the change in #11807 for syncd docker, but it's better to add it in bullseye base docker.
How I did it
Install python-is-python3 package in bullseye base docker to resolve this issue, whatever run python or python3, it will run /usr/bin/python3, will not cause the error of can't find command '/usr/bin/python'
How to verify it
run python in syncd container.
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Why I did it
Migrate FRR to bullseye
How I did it
Makefile and docker config changes to refer to bullseye instead of buster.
How to verify it
Build bullseye frr docker.
Co-authored-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
At SWSS docker init time, check the device subtype and enable tunnel packet handler only if it is dualtor
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
#### Why I did it
The default stable version of rsyslog on bullseye has a bug about rate limit. It causes rate limit not work. The bug has been fixed on backport version 8.2206.0-1~bpo11+1.
Buster has no such issue.
#### How I did it
Upgrade rsyslog from 8.2110.0 to 8.2206.0-1~bpo11+1
#### How to verify it
Manual test
*The initial commit for the P4RT docker hard coded all the flags which makes it difficult to configure at runtime. Reading them from the CONFIG_DB allows for more flexibility.
ping command is not working inside PMON docker (bullseye)
Use case: chassisd checks for module reachability inside PMON for "show chassis modules midplane-status" CLI, and on Cisco chassis, this uses ping command to check network reachability
#### Why I did it
Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597
When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update.
#### How I did it
Change flush_unused_database code to use swsscommon API:
Change get_instancelist to getInstanceList.
Change get_dblist to getDbList.
#### How to verify it
Pass all E2E test.
Manually check syslog make sure error log not exist and swss, syncd, bgp service started.
Search code in Azure make sure there all similer case are fixed in this PR.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Fix docker-database flush_unused_database failed issue: https://github.com/Azure/sonic-buildimage/issues/11597
When change flush_unused_database from use swsssdk to use swsscommon, get_instancelist() and get_dblist() name changed but not update.
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
What I did:
Following changes done for packet based chassis:-
1> Run arp_update on LC's to resolve static route nexthops over backend
port-channel interfaces.
2> On Supervisor make sure arp_update exit gracefully
* Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation
Current armhf Sonic build on amd64 host uses qemu emulation. Due to the
nature of the emulation it takes a very long time, about 22-24 hours to
complete the build. The change I did to reduce the building time by
porting Sonic armhf build on amd64 host for Marvell platform for debian
buster to use cross-compilation on arm64 host for armhf target. The
overall Sonic armhf building time using cross-compilation reduced to
about 6 hours.
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Fixed final Sonic image build with dockers inside
* Update Dockerfile.j2
Fixed qemu-user-static:x86_64-aarch64-5.0.0-2 .
* Update cross-build-arm-python-reqirements.sh
Added support for both armhf and arm64 cross-build platform using $PY_PLAT environment variable.
* Update Makefile
Added TARGET=<cross-target> for armhf/arm64 cross-compilation.
* Reviewer's @qiluo-msft requests done
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Added new radius/pam patch for arm64 support
* Update slave.mk
Added missing back tick.
* Added libgtest-dev: libgmock-dev: to the buster Dockerfile.j2. Fixed arm perl version to be generic
* Added missing armhf/arm64 entries in /etc/apt/sources.list
* fix libc-bin core dump issue from xumia:fix-libc-bin-install-issue commit
* Removed unnecessary 'apt-get update' from sonic-slave-buster/Dockerfile.j2
* Fixed saiarcot895 reviewer's requests
* Fixed README and replaced 'sed/awk' with patches
* Fixed ntp build to use openssl
* Unuse sonic-slave-buster/cross-build-arm-python-reqirements.sh script (put all prebuilt python packages cross-compilation/install inside Dockerfile.j2). Fixed src/snmpd/Makefile to use -j1 in all cases
* Clean armhf cross-compilation build fixes
* Ported cross-compilation armhf build to bullseye
* Additional change for bullseye
* Set CROSS_BUILD_ENVIRON default value n
* Removed python2 references
* Fixes after merge with the upstream
* Deleted unused sonic-slave-buster/cross-build-arm-python-reqirements.sh file
* Fixed 2 @saiarcot895 requests
* Fixed @saiarcot895 reviewer's requests
* Removed use of prebuilt python wheels
* Incorporated saiarcot895 CC/CXX and other simplification/generalization changes
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Fixed saiarcot895 reviewer's additional requests
* src/libyang/patch/debian-packaging-files.patch
* Removed --no-deps option when installing wheels. Removed unnecessary lazy_object_proxy arm python3 package instalation
Co-authored-by: marvell <marvell@cpss-build3.marvell.com>
Co-authored-by: marvell <marvell@cpss-build2.marvell.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
This PR aims to check in the commit f2b11e4 introduced by the update related to gNMI python client.
How I did it
I changed the Dockfile.j2 such that the update of gNMI client script will be checked in when ptf docker image is built.
How to verify it
A PTF container image was built and then loaded on a testbed. I checked the update of gNMI client script was checked in.
#### Why I did it
Update scripts in sonic-buildimage from py-swsssdk to swsscommon
#### How I did it
Change code to use swsscommon.
#### How to verify it
Pass all E2E test case
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Update scripts in sonic-buildimage from py-swsssdk to swsscommon
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
Keysight provide a new version with some snappi API source code related fix: snappi[ixnetwork,convergence]==0.7.44
How I did it
Upgrade snappi version to 0.7.44
How to verify it
Whether it's installed in sonic-mgmt docker container
- Why I did it
To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker
- How I did it
Added the "print_suppressions=0" to ASAN configs.
- How to verify it
add a suppression to some ASAN-enabled component (the suppression should catch some leak)
build with ENABLE_ASAN=y
run a test and see that the ASAN report is empty instead of having the suppression summary
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* [BGP]Adding configuration knob to allow advertise Loopback ipv6 /128 prefix
By default when IPv6 address is configured with /128 as subnet mask in Loopback0 interface, it will be advertised as prefix with /64 subnet.
To control this behavior a new field 'bgp_adv_lo_prefix_as_128' is introduced in DEVICE_METADATA table which when set to true will advertise prefix with /128 subnet as it is.
What/Why I did:
Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability.
Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created.
Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface.
Issue 2: Use of IPVLAN vs MACVLAN
Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted.
Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address.
Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server.
Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle
Solution: Added explicit PING from namespace that will check this reachability.
Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet.
Solution: Handle gracefully via try..except and log the message.
Why I did it
There is a bug that the Port attributes in CONFIG_DB will be cleared if using sudo config macsec port add Ethernet0 or sudo config macsec port del Ethernet0
How I did it
To fetch the port attributes before set/remove MACsec field in port table.
Signed-off-by: Ze Gan <ganze718@gmail.com>
Fixes#9279
- Why I did it
Part of larger effort to move all SONiC systems to bullseye
- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3
- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
Why I did it
When lldpmgrd handled events of other tables besides PORT_TABLE, error message was printed to log.
How I did it
Handle event according to its file descriptor instead of looping all registered selectables for each coming event.
How to verify it
I verified same events are being handled by printing events key and operation, before and after the change.
Also, before the change, in init flow after config reload, when lldpmgrd handled events of other tables besides PORT_TABLE, error messages were printed to log, this issue is solved now.
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.
docker-database:latest
docker-swss:latest
When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.
This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.
docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag
The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
Why I did it
Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well.
Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts.
Signed-off-by: Zhaohui Sun zhaohuisun@microsoft.com
How I did it
Add python3 virtual environment for docker-ptf.
Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here:
p4lang/ptf#173p4lang/ptf#174
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Why I did it
Recirc port is used to only forward traffic from one asic to another asic. So it's not required to configure LLDP on it.
How I did it
Add interface prefix helper for recirc port. Similar to skip configuring LLDP on inband port, add check in lldpmgrd to skip recirc port by checking interface prefix.
Asic PCI ID (PCI address) is collected by chassisd (inside pmon -
Azure/sonic-platform-daemons#175) and saved in CHASSIS_STATE_DB (in
redis_chassis). CHASSIS_STATE_DB is accessible by swss containers.
At docker-init.sh (script is called after swss container is created and before
anything that could run in swss like orchagent...), we wait until asic PCI ID
of the corresponding asic is populated by chassisd. We then update asic_id in
CONFIG_DB of asic's database.
A system supporting dynamic asic PCI ID identification requires to have a file
(empty) use_pci_id_chassis in its platform dir.
When orchagent runs, it has correct asic PCI ID in its CONFIG_DB.
Together with this PR:
Azure/sonic-platform-daemons#175Azure/sonic-platform-common#185
Signed-off-by: Maxime Lorrillere <mlorrillere@arista.com>
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Why I did it
This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container.
Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following:
check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"
If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted.
Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window.
The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok.
Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry:
Program 'container_memory_telemetry'
status Status ok
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 0
last output -
data collected Sat, 19 Mar 2022 19:56:26
Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times
within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service:
Program 'container_memory_telemetry'
status Status failed
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 0
last output -
data collected Tue, 01 Feb 2022 22:52:55
After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok.
How I did it
In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles.
How to verify it
I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.
sign-off: Jing Zhang zhangjing@microsoft.com
#### Why I did it
As part of the process moving containers from buster to bullseye.
#### How I did it
1. change base image from buster to bullseye.
2. remove unused addition to orchagent run options
#### How to verify it
Tested building locally.
It upgraded scapy to 2.4.5 in docker-ptf container, after this upgrade, all scripts under ansible/roles/test/files/ptftests will import scapy 2.4.5, some test cases will fail because they are not upgraded accordingly.
Reverts #10507 to avoid breaking regression test.
This reverts commit 92efc01270.
Removed python2 support for sonic-platform-daemons that was causing unit
test errors in sonic_pcied.
* Removed config from docker supervisord jinja templates per VD review comment
* Removed space and python3 per QL comments
Why I did it
Existing dataplane tests cannot be tested under MACsec environment due to the traffic under MACsec link is encrypted. So, I will override the dp_poll of ptf to MACsec dp_poll to decrypt the MACsec packets on injected ports (PR: Azure/sonic-mgmt#5490). MACsec decryption library depends on scapy 2.4.5.
How I did it
Upgrade scapy library to 2.4.5 by pip.
How to verify it
Check the scapy version in docker-ptf by
python -c "import scapy; print(scapy.__version__)"
2.4.5
Signed-off-by: Ze Gan <ganze718@gmail.com>
Why I did it
Running warm-reboot in a loop for 500 times leads to this error on 318-th iteration:
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors Traceback (most recent call last):
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors File "/usr/bin/restore_neighbors.py", line 24, in <module>
Apr 2 15:56:27.346747 sonic INFO swss#/supervisord: restore_neighbors from scapy.all import conf, in6_getnsma, inet_pton, inet_ntop, in6_getnsmac, get_if_hwaddr, Ether, ARP, IPv6, ICMPv6ND_NS, ICMPv6NDOptSrcLLAddr
Apr 2 15:56:27.346795 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/all.py", line 25, in <module>
Apr 2 15:56:27.346956 sonic INFO swss#/supervisord: restore_neighbors from scapy.route import *
Apr 2 15:56:27.346995 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/route.py", line 205, in <module>
Apr 2 15:56:27.347089 sonic INFO swss#/supervisord: restore_neighbors conf.iface = get_working_if()
Apr 2 15:56:27.347129 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/linux.py", line 128, in get_working_if
Apr 2 15:56:27.347213 sonic INFO swss#/supervisord: restore_neighbors ifflags = struct.unpack("16xH14x", get_if(i, SIOCGIFFLAGS))[0]
Apr 2 15:56:27.347250 sonic INFO swss#/supervisord: restore_neighbors File "/usr/local/lib/python3.7/dist-packages/scapy/arch/common.py", line 31, in get_if
Apr 2 15:56:27.347345 sonic INFO swss#/supervisord: restore_neighbors return ioctl(sck, cmd, struct.pack("16s16x", iff.encode("utf8")))
Apr 2 15:56:27.347365 sonic INFO swss#/supervisord: restore_neighbors OSError: [Errno 19] No such device
The issue was reported to scapy devs secdev/scapy#3369, the fix is secdev/scapy#3371, however there is no released scapy version with this fix right now, thus decided to build scapy v2.4.5 from sources and apply the fix in a form of a patch.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>