#### Why I did it
When CPU is busy, the sonic_ax_impl may not have sufficient speed to handle the notification message sent from REDIS.
Thus, the message will keep stacking in the memory space of sonic_ax_impl.
If the condition continues, the memory usage will keep increasing.
#### How I did it
Add a monit file to check if the SNMP container where sonic_ax_impl resides in use more than 4GB memory.
If yes, restart the sonic_ax_impl process.
#### How to verify it
Run a lot of this command: `while true; do ret=$(redis-cli -n 0 set LLDP_ENTRY_TABLE:test1 test1); sleep 0.1; done;`
And check the memory used by sonic_ax_impl keeps increasing.
After a period, make sure the sonic_ax_impl is restarted when the memory usage reaches the 4GB threshold.
And verify the memory usage of sonic_ax_impl drops down from 4GB.
Change references to use bullseye instead of buster
Why I did it
Almost all daemons in 202211 and master uses bullseye, and sflow was easy to migrate.
How I did it
Replaced the references, built and tested in 202211.
How to verify it
Build with the changes, enable sflow:
admin@sonic:~$ sudo config sflow collector add test 1.2.3.4
admin@sonic:~$ sudo config sflow collector enable
tcpdump on 1.2.3.4 and see that UDP sFlow are being sent.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Change references to use bullseye instead of buster
Why I did it
Almost all daemons in 202211 and master uses bullseye, and NAT seems easy to migrate.
How I did it
Replaced the references, built with 202211 branch.
How to verify it
Not sure, it builds and tests pass as far as I can tell but I don't use the feature myself.
Signed-off-by: Christian Svensson <blue@cmd.nu>
catch system error and log as warning level instead of
error level in case interface was already deleted.
Why I did it
sflow process exited when failed to convert the interface index from interface name
How I did it
Added exception handling code and logged when OSError exception.
How to verify it
Recreated the bug scenario #11437 and ensured that sflow process not exited.
Description for the changelog
catch system error and log as warning level instead of
error level in case interface was already deleted.
Logs
steps :
root@sonic:~# sudo config vlan member del 4094 PortChannel0001
root@sonic:~# sudo config vlan member del 4094 Ethernet2
root@sonic:~# sudo config vlan del 4094
root@sonic:~#
"WARNING sflow#port_index_mapper: no interface with this name" is seen but no crash is reported
syslogs :
Jan 23 09:17:24.420448 sonic NOTICE swss#orchagent: :- removeVlanMember: Remove member Ethernet2 from VLAN Vlan4094 lid:ffe vmid:27000000000a53
Jan 23 09:17:24.420710 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
Jan 23 09:17:24.420847 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3
Jan 23 09:17:24.426082 sonic NOTICE syncd#syncd: :- processFdbFlush: fdb flush succeeded, updating redis database
Jan 23 09:17:24.426242 sonic NOTICE syncd#syncd: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000a52, bvId = oid:0x26000000000a51
Jan 23 09:17:24.426374 sonic NOTICE syncd#syncd: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*oid:0x26000000000a51*, portStr oid:0x3a000000000a52
Jan 23 09:17:24.427104 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0
Jan 23 09:17:24.427182 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fd00::/80 :: docker0
Jan 23 09:17:24.428502 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Jan 23 09:17:24.429058 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000606 sec
Jan 23 09:17:24.431496 sonic NOTICE swss#orchagent: :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_STRIP to host interface: Ethernet2
Jan 23 09:17:24.431675 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
Jan 23 09:17:24.431797 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2
Jan 23 09:17:24.437009 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC
Jan 23 09:17:24.437532 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000514 sec
Jan 23 09:17:24.437942 sonic NOTICE syncd#syncd: :- processFdbFlush: fdb flush succeeded, updating redis database
Jan 23 09:17:24.438065 sonic NOTICE syncd#syncd: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000a52, bvId = oid:0x0
Jan 23 09:17:24.438173 sonic NOTICE syncd#syncd: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*, portStr oid:0x3a000000000a52
Jan 23 09:17:24.440348 sonic NOTICE swss#orchagent: :- removeBridgePort: Remove bridge port Ethernet2 from default 1Q bridgeJan 23 09:17:29.782554 sonic NOTICE swss#orchagent: :- removeVlan: VLAN Vlan4094 still has 1 FDB entries
Jan 23 09:17:29.791373 sonic WARNING sflow#port_index_mapper: no interface with this name
Signed-off-by: Gokulnath-Raja <Gokulnath_R@dell.com>
Why I did it
Add AZP agent necessary packages to sonic-mgmt-docker
Remove Python 201811 venv
Update some packages in order to meet internal security requirements
How I did it
Update sonic-mgmt-docker file
How to verify it
sonic-mgmt-docker can run: bash, apt update, apt install and ping.
start.sh is under /azp with exec permission.
env-201811 venv is removed.
jinja2 is upgrade to 2.10.1
#### Why I did it
Bug in script that was passing in null as log level value if missing from config_db
#### How I did it
Added more robust conditional statement
#### How to verify it
1) Remove log_level from config db
2) config reload -y
3) telemetry should not crash
- Why I did it
Fixes#14236
When a redis event quickly gets outdated during port breakout, error logs like this are seen
Mar 8 01:43:26.011724 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': {'admin_status': 'down'}, 'Ethernet68': {'admin_status': 'down'}}}
Mar 8 01:43:26.012565 r-leopard-56 INFO ConfigMgmt: Writing in Config DB
Mar 8 01:43:26.013468 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': None, 'Ethernet68': None}, 'INTERFACE': None}
Mar 8 01:43:26.018095 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Configure Ethernet64 admin status to down
Mar 8 01:43:26.018309 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Delete Port: Ethernet64
Mar 8 01:43:26.018641 r-leopard-56 NOTICE lldp#lldpmgrd[32]: :- pops: Miss table key PORT_TABLE:Ethernet64, possibly outdated
Mar 8 01:43:26.018654 r-leopard-56 ERR lldp#lldpmgrd[32]: unknown operation ''
- How I did it
Only log the error when the op is not empty and not one of ("SET" & "DEL" )
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Why I did it
Currently the show and clear cli of dhcp_relayis may cause confusion.
How I did it
Add doc for it: [doc] Add docs for dhcp_relay show/clear cli sonic-utilities#2649
Add dhcp_relay config cli and test cases.
show dhcp_relay ipv4 helper
show dhcp_relay ipv6 destination
show dhcp_relay ipv6 counters
sonic-clear dhcp_relay ipv6 counters
How to verify it
Unit test all passed
It will allow us to have application called "mergecap" - which can merge multiple .pcap files into single .pcapng file and convert it to .pcap file
Signed-off-by: Petro Pikh <petrop@nvidia.com>
#### Why I did it
Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely
#### How I did it
Remove from critical process list
Why I did it
DHCPv6 relay config entry is not useful while del dhcpv6 relay config.
How I did it
Remove dhcpv6_relay entry if it is empty and not check entry exist while adding dhcpv6 relay
Why I did it
dplane_fpm_nl is a new FPM implementation in FRR. The old plugin fpm will not have any new features implemented. Usage of the new plugin gives us ability to use BGP suppression feature and next hop groups in the future.
How I did it
Switch to dplane_fpm_nl zebra plugin from old fpm plugin which is not supported anymore
Remove stale patches for old fpm plugin and add similar patches for dplane_fpm_nl
How to verify it
Build and run on the switch.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Currently the config cli of dhcpv4 is may cause confusion and config of dhcpv6 is missing.
How I did it
Add dhcp_relay config cli and test cases.
config dhcp_relay ipv4 helper (add | del) <vlan_id> <helper_ip_list>
config dhcp_relay ipv6 destination (add | del) <vlan_id> <destination_ip_list>
Updated docs for it in sonic-utilities: https://github.com/sonic-net/sonic-utilities/pull/2598/files
How to verify it
Build docker-dhcp-relay.gz with and without INCLUDE_DHCP_RELAY, and check target/docker-dhcp-relay.gz.log
#### Why I did it
Improve naming convention for bgp notification events and change type of leaf for sonic-events-host mem usage from uint64 to decimal64
#### How I did it
Replace "-" with "_"
Replace uint64 with decimal64
#### How to verify it
Run yang model unit tests
#### Description for the changelog
Change YANG model leaf naming convention for bgp notification
*Critical process for database-chassis is redis-chassis but critical_process contains hard-coded
to `redis` program always. Instead using jinja2 template to render critical process list based on database docker type. redis-chassis for database-chassis docker and redis for regular database docker.
if there is no request, you need to use curl to get data from bmc, and each query needs to start a curl process. pmon is a circular query, which will pull up multiple processes in a loop, which consumes a lot. Using request does not need to pull up the process.
Avoid traceback on sonic-clear command
sonic-clear dhcp6relay_counters
Traceback (most recent call last):
File "/usr/local/bin/sonic-clear", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/clear/plugins/dhcp-relay.py", line 19, in dhcp6relay_clear_counters
counter = DHCPv6_Counter()
NameError: name 'DHCPv6_Counter' is not defined
- How I did it
Corrected the way to import using importlib
- How to verify it
Tested the sonic-clear command and verified no traceback is seen
- Why I did it
Support syslog rate limit configuration feature
- How I did it
Remove unused rsyslog.conf from containers
Modify docker startup script to generate rsyslog.conf from template files
Add metadata/init data for syslog rate limit configuration
- How to verify it
Manual test
New sonic-mgmt regression cases
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.
How I did it
add coalesce-time 10000 to the frr bgp startup config.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Why I did it
Platform interface doesn't provide all sensors and using it isn't effective
How I did it
Request sensors via http from BMC server and parse the result
How to verify it
Related daemon in pmon populates redis db, run this command to view the contents
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
Why I did it
Fixes#12575 and #12575
How I did it
In the PR sonic-net/sonic-platform-daemons#311 chassisd updates to CHASSIS_FABRIC_ASIC_INFO with the fabric asic info.
Updating the asic_status.py to read from the correct table.
How to verify it
test on chassis
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
- Why I did it
Upgrade the app-extension developer environments (sonic-sdk & sonic-sdk-bullseye) to bullseye
- How to verify it
Built an app-extension using these images and verified if it is up and running.
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Why I did it
There were some changes in apt source code in version 2.1.9.
As a result apt used in bullseye (2.2.4) is intolerant to network issues.
This was fixed in 10631550f1 Already fixed version is used in bookworm (2.5.4)
And not yet affected version is used in buster (1.8.2.3)
How I did it
Set Acquire::Retries to 3 for sonic-slave-bullseye, docker-base-bullseye and final Debian image.
Ref: https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1876035
Signed-off-by: Konstantin Vasin k.vasin@yadro.com
* [SAI PTF] SAI PTF docker support sai-ptf v2
Publish the sai-ptf docker.
Take part of the change from previous PR #11610 (already reverted as some cache issue)
Cause in #11610, added two new target in it, one is sai-ptf another one is syncd-rpc with sai-ptf v2, to make the upgrade with more clear target, use this one take the sai-ptf one.
Test one:
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=vs
NOSTRETCH=y NOJESSIE=y NOBULLSEYE=y SAITHRIFT_V2=y make target/docker-ptf-sai.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless parameters
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Update azure-pipelines-build.yml
remove a useless option
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
- Why I did it
The values for config_db "docker_routing_config_mode" are:
separated: FRR config generated from ConfigDB, each FRR daemon has its own config file
unified: FRR config generated from ConfigDB, single FRR config file
split: FRR config not generated from ConfigDB, each FRR daemon has its own config file
This commit adds:
split-unified: FRR config not generated from ConfigDB, single FRR config file
- How I did it
In docker_init.sh, when split-unified is used, the FRR configs are not generated
from ConfigDB. What's more, "service integrated-vtysh-config" is configured in vtysh.conf.
- How to verify it
FRR config not overwritten when FRR container starts.
Signed-off-by: Arnaud le Taillanter <a.letaillanter@criteo.com>
bgpd.main.conf.j2: bugfix-9739
* Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing.
How I did it
Include a conditional statement to avoid configuring bgp in FRR when 'bgp_asn' is missing or set to 'None' or 'Null'
How to verify it
Configure 'bgp_asn' as 'None', 'Null' or have it missing from configurations and verify that /etc/frr/bgpd.conf does not have invalid bgp configurations like 'router bgp None'
Description for the changelog
Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing for bugfix 9739.
Signed-off-by: cchoate54@gmail.com
Why I did it
Unify the Debian mirror sources
Make easy to upgrade to the next Debian release, not source url code change required.
Support to customize the Debian mirror sources during the build
Relative issue: #12523
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`subprocess` is used with `shell=True`, which is very dangerous for shell injection.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
#### How I did it
remove `shell=True`, use `shell=False`
Replace `os` by `subprocess`
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
Why I did it
closes#12343
Today in SONiC the notify-keyspace-events is from DbInterface class when application try do any configdb set.
In Chassis the chassis_db may not get any configdb set operations, so there is chance this configuration will never be set.
So the chassis_db updates from one line card will not be propogated to other linecards, which are doing a psubscribe to get these event.
How I did it
update the redis.conf to set notify-keyspace-events AKE so that the notify-keyspace-events are set when the redis instance is started
How to verify it
Test on chassis
Why I did it
Add the missing debian source bullseye-updates/buster-updates
The build failure as below, it is caused by the docker image debian:bullseye used the version 2.31-13+deb11u5, but the version only available in bullseye-update.
- Skip the interface status check if the interface does not exist. In the future, when the interface is created/comes up this check will be triggered again.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Why I did it
The Dockerfile of docker-sonic-mgmt became a little bit messy over time. Some packages are also a little bit too old. It would be better to do some cleanup and upgrade some important packages.
How I did it
Updated the dockerfile template for building docker-sonic-mgmt.
How to verify it
Locally built the docker-sonic-mgmt image and used it to run some test scripts.
Description for the changelog:
The build-essential package contains gcc and make. It's unnecessary to install them again.
The python-is-python2 package is included in the python package for Ubuntu 20.04. It's unnecessary to install it again.
Sort the apt and pip packages by alphabetic order.
Cleanup get-pip.py after installation.
Cleanup the python-scapy deb package after installation.
Ensure that the python pip, setuptools and wheel packages are up to date.
Install pytest-ansible from pip instead of from source code.
While installing docker-ce-cli, it's unnecessary to install curl and software-properties-common again.
Merged some pip install steps into one step.
Upgrade ansible from 2.8.12 to 2.9.27 for env-python3.
Upgrade pytest to 7.1.3 for env-python3.
Add ncclient package to evn-python3.
* Add smartmontools to pmon docker
* Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files
* Add comments on smartmontools version for both host and pmon
- Why I did it
Fixes#11431
- How I did it
dhcp6relay binds to ipv6 addresses configured on these vlan interfaces
Thus check if they are ready before launching dhcp6relay
- How to verify it
Unit Tests
Tested on a live device
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>