Back port a patch from upstream FRR - FRRouting/frr#14675
Why I did it
The EVPN route is not treated correctly and thus leading to messages:
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 20.0.0.2/32, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 200.0.0.0/24, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 200::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::2/128, resolving neighbor
This happens because fpmsyncd does not get encap type field in FPM message.
Work item tracking
Microsoft ADO (number only):
How I did it
Backport fix from FRR.
How to verify it
EVPN scenario.
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.
How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces
How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Orchagent uses PORTCHANNEL term when parsing this field. Change the YANG model to align to orchagent.
- Why I did it
When specifying PORTCHANNEL in ACL_TABLE_TYPE table YAGN model validation does not pass, when using term LAG orchagent does not accept such table type.
Fix it by aligning YANG model to orchagent.
- How I did it
Fix in YANG model.
- How to verify it
Create custom ACL table type.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it
src/sonic-sairedis
```
* 7acd028 - (HEAD -> master, origin/master, origin/HEAD) [gbsyncd] Add asic db prefix for channel RESTARTQUERY (#1302) (3 hours ago) [Junhua Zhai]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 51bfb4c1 - (HEAD -> master, origin/master, origin/HEAD) [muxorch] Fixing updateRoute logic (#2952) (3 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fix#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.
This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
Related PRs:
sonic-net/sonic-utilities#2972sonic-net/sonic-sairedis#1288sonic-net/sonic-sairedis#1298
How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.
1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.
2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:
For each ASIC, such as ASIC0,
3.1. Config Redis consistency directory.
redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null
3.2. Save the Redis data.
redis-cli -h $hostname -p $port SAVE > /dev/null
3.3. Run rdb command to convert the dump files into JSON files
rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null
3.4. Run saidump -r to update the JSON files' format as same as the saidump before.
Then we can get the saidump's result in standard output."
saidump -r $redis_dir/dump.json -m 100
3.5. Clear the temporary files.
rm -f $redis_dir/dump.rdb
rm -f $redis_dir/dump.json
4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
#### Why I did it
src/sonic-swss
```
* 2b02c249 - (HEAD -> master, origin/master, origin/HEAD) Send hearbeat during warm reboot freese (#2923) (81 minutes ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Share docker image to support gnmi container and telemetry container
Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.
How to verify it
Run end to end test.
Why I did it
Fix#17097
If I set SONIC_VERSION_CONTROL_COMPONENTS=all and MIRROR_SNAPSHOT=y in rules/config file then I get incorrect sources.list files (with latest available snapshots instead of snapshot from files/build/versions/default/versions-mirror).
Work item tracking
Microsoft ADO (number only):
How I did it
Pass directly make variable SONIC_VERSION_CONTROL_COMPONENTS to subshell.
How to verify it
Build and check generated sources.list files.
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.
- Why I did it
Removing 8x split DPB mode from platform files since it is not fully supported yet.
- How I did it
Updating platform file.
- How to verify it
Manual testing.
Why I did it
This is part of Python3 migration project.
This pipeline will build sonic-mgmt-docker with both Python2 and Python3.
The main difference between legacy sonic-mgmt-docker and now is:
make LEGACY_SONIC_MGMT_DOCKER=y target/docker-sonic-mgmt.gz
docker tag docker-sonic-mgmt $REGISTRY_SERVER/docker-sonic-mgmt:legacy
Work item tracking
Microsoft ADO (number only): 25254349
How I did it
Add pipeline file.
#### Why I did it
src/sonic-host-services
```
* beb8bbe - (HEAD -> master, origin/master, origin/HEAD) [DualToR][caclmgrd] Fix IPtables rules for multiple vlan interfaces for DualToR config (#82) (3 hours ago) [vdahiya12]
```
#### How I did it
#### How to verify it
#### Description for the changelog
fixes#16011
Why I did it
seeing below warning ,essage:
libyang[1]: Default value "" in the list key "port" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
libyang[1]: Default value "" in the list key "vrf_name" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
name of list is not <model_name>_LIST.
Work item tracking
Microsoft ADO 25646016:
How I did it
Remove default value provided to key in yang model to avoid seeing below error:
libyang[1]: Default value "" in the list key "port" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
libyang[1]: Default value "" in the list key "vrf_name" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
Modify the LIST name to have <model_name>_LIST as this was failing yang validation during unit-tests.
How to verify it
unit-tests passing.
Before fix
admin@vlab-01:~$ sudo sonic-package-manager list
libyang[1]: Default value "" in the list key "port" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
libyang[1]: Default value "" in the list key "vrf_name" is ignored. (/sonic-snmp:sonic-snmp/SNMP_AGENT_ADDRESS_CONFIG/SNMP_AGENT_ADDRESS_LIST)
Name Repository Description Version Status
-------------- --------------------------- ---------------------------- --------- ---------
database docker-database SONiC database package 1.0.0 Built-In
dhcp-relay docker-dhcp-relay N/A 1.0.0 Installed
eventd docker-eventd SONiC eventd package 1.0.0 Built-In
fpm-frr docker-fpm-frr SONiC fpm-frr package 1.0.0 Built-In
gbsyncd docker-gbsyncd-vs SONiC gbsyncd package 1.0.0 Built-In
lldp docker-lldp SONiC lldp package 1.0.0 Built-In
macsec docker-macsec N/A 1.0.0 Installed
mgmt-framework docker-sonic-mgmt-framework SONiC mgmt-framework package 1.0.0 Built-In
mux docker-mux SONiC mux package 1.0.0 Built-In
nat docker-nat SONiC nat package 1.0.0 Built-In
pmon docker-platform-monitor SONiC pmon package 1.0.0 Built-In
radv docker-router-advertiser SONiC radv package 1.0.0 Built-In
sflow docker-sflow SONiC sflow package 1.0.0 Built-In
snmp docker-snmp SONiC snmp package 1.0.0 Built-In
swss docker-orchagent SONiC swss package 1.0.0 Built-In
syncd docker-syncd-vs SONiC syncd package 1.0.0 Built-In
teamd docker-teamd SONiC teamd package 1.0.0 Built-In
telemetry docker-sonic-telemetry SONiC telemetry package 1.0.0 Built-In
After fix:
admin@vlab-01:~$ sudo sonic-package-manager list
Name Repository Description Version Status
-------------- --------------------------- ---------------------------- --------- ---------
database docker-database SONiC database package 1.0.0 Built-In
dhcp-relay docker-dhcp-relay N/A 1.0.0 Installed
eventd docker-eventd SONiC eventd package 1.0.0 Built-In
fpm-frr docker-fpm-frr SONiC fpm-frr package 1.0.0 Built-In
gbsyncd docker-gbsyncd-vs SONiC gbsyncd package 1.0.0 Built-In
lldp docker-lldp SONiC lldp package 1.0.0 Built-In
macsec docker-macsec N/A 1.0.0 Installed
mgmt-framework docker-sonic-mgmt-framework SONiC mgmt-framework package 1.0.0 Built-In
mux docker-mux SONiC mux package 1.0.0 Built-In
nat docker-nat SONiC nat package 1.0.0 Built-In
pmon docker-platform-monitor SONiC pmon package 1.0.0 Built-In
radv docker-router-advertiser SONiC radv package 1.0.0 Built-In
sflow docker-sflow SONiC sflow package 1.0.0 Built-In
snmp docker-snmp SONiC snmp package 1.0.0 Built-In
swss docker-orchagent SONiC swss package 1.0.0 Built-In
syncd docker-syncd-vs SONiC syncd package 1.0.0 Built-In
teamd docker-teamd SONiC teamd package 1.0.0 Built-In
telemetry docker-sonic-telemetry SONiC telemetry package 1.0.0 Built-In
Why I did it
Enable the suppress fib feature by default.
Work item tracking
Microsoft ADO (25564723):
How I did it
In minigraph.py, to add the field suppress-fib-pending, and enable it for leafrouter.
How to verify it
Build / load image and check the config_db by show CLI.
admin@str-7260cx3-acs-2:~$ show suppress-fib-pending
Enabled
Need to modify the tests/bgp/test_bgp_suppress_fib.py in sonic-mgmt repo, to check the config before restore. Otherwise, after this test, it will turn off the suppress-fib-pending.
sonic-net/sonic-mgmt#10612
Why I did it
This is part of Python3 migration project. This PR will add a new makefile flag: LEGACY_SONIC_MGMT_DOCKER
Now by default: LEGACY_SONIC_MGMT_DOCKER = y will build sonic-mgmt-docker with Python2 and Python3
If LEGACY_SONIC_MGMT_DOCKER = n will will sonic-mgmt-docker with Python3 only
Work item tracking
Microsoft ADO (number only): 25254349
How I did it
Add makefile flag: LEGACY_SONIC_MGMT_DOCKER
How to verify it
By default will build sonic-mgmt-docker with Python2 and Python3. No change compared to before.
Set LEGACY_SONIC_MGMT_DOCKER=n will build sonic-mgmt-docker with Python3 only
This is CSP CS00012280996.
The issue to fix is that the checksum was incorrect for all TCP packets leaving the system so that the BGP connection cannot be established. We found the issue on BCM56993, and it is possible to affect all platforms using linux_ngknet.
#### Why I did it
src/sonic-swss-common
```
* a57cf9e - (HEAD -> master, origin/master, origin/HEAD) Add batch support in ZmqProducerStateTable. (#803) (10 hours ago) [mint570]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* a75a3df - (HEAD -> master, origin/master, origin/HEAD) arm64: Kconfig inclusions to fix PCI hang and MTD detection (#350) (3 hours ago) [Pavan Naregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 917c21e0 - (HEAD -> master, origin/master, origin/HEAD) Add more debug information when PFC WD is triggered (#2858) (10 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
This PR is part of sonic-mgmt-docker Python3 migration project.
Work item tracking
Microsoft ADO (number only): 24397943
How I did it
Upgrade Ansible to 6.7.0
Make Python3 as the default interpreter. python is a soft link to python3. If you want to use python2, use the command python2 explicitly.
Upgrade some pip packages to higher version in order to meet security requirement.
How to verify it
Build a private sonic-mgmt-docker successfully.
Verify python is python3.
Verify python2 is working with 202012 and 202205 branch.
Verify python3 is working with master branch.
Verify with github PR test.
### Why I did it
We use `EdgeZoneAggregator` in `db_migrator`, but we don't support this pattern in sonic yang models. Hence, we update this in the sonic-yang model.
##### Work item tracking
- Microsoft ADO **(number only)**: 25574132
#### How I did it
Update the device pattern list.
#### Why I did it
HLD implementation: Container Hardening (https://github.com/sonic-net/SONiC/pull/1364)
##### Work item tracking
- Microsoft ADO **(number only)**: 14807420
#### How I did it
Reduce linux capabilities in privileged flag
#### How to verify it
Run sflow sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
```
admin@vlab-01:~$ docker inspect sflow | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker exec -it sflow bash
root@vlab-01:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
```
#### Why I did it
This header file comes from an external package, and a very old version of the header file has been checked into swss-common. This will cause problems for the upcoming Bookworm upgrade.
##### Work item tracking
- Microsoft ADO **(number only)**: 25411155
#### How I did it
Change references to the header file to use the Debian package nlohmann-json-dev, instead of from swss-common.
### Tested branch (Please provide the tested image version)
- [ ] <!-- image version 1 -->
- [ ] VS image from pipeline build
Verified that eventd was running
Why I did it
To avoid orchagent crash issue like sonic-net/sonic-swss#2935, disable unsupported counters on SONiC management devices.
Work item tracking
Microsoft ADO (number only): 25437720
How I did it
Update the minigraph parser to disable unsupported counters on management devices.
How to verify it
Verified by unittest.
Manually apply patch to DUT and do config load_minigraph
Why I did it
XGS saibcm-modules 8.4 is needed. #14471
Work item tracking
Microsoft ADO (number only): 24917414
How I did it
Copy files from xgs SDK 8.4 repo and modify makefiles to build the image.
Upgrade version to 8.4.0.2 in saibcm-modules.mk.
How to verify it
Build a private image and run full qualification with it: https://elastictest.org/scheduler/testplan/650419cb71f60aa92c456a2b
#### Why I did it
src/sonic-sairedis
```
* 7210b0c - (HEAD -> master, origin/master, origin/HEAD) [Link event damping] Add utility methods. (#1313) (20 hours ago) [Ashish Singh]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses.
This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types.
Work item tracking
Microsoft ADO 17964421:
How I did it
Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1.
The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps.
How to verify it
Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3.
import threading
from scapy.all import *
def send_dhcp_discover(intf):
dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \
/IP(src='1.1.1.1',dst='255.255.255.255') \
/UDP(sport=68,dport=67) \
/DHCP(options=[('message-type','discover'),('end')])
sendp(dhcp_discover,count=100000,iface=intf)
if __name__ == "__main__":
t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",))
t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",))
t1.start()
t2.start()
t1.join()
t2.join()
Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue.
Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
#### Why I did it
src/sonic-sairedis
```
* 1ef16ee - (HEAD -> master, origin/master, origin/HEAD) [Link event damping] Add generic concurrent queue for link event damping. (#1297) (11 hours ago) [Ashish Singh]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* a9867e67 - (HEAD -> master, origin/master, origin/HEAD) Fix acl match ip_type_non_ipv4 and ip_type_non_ipv6. (#2842) (5 hours ago) [LTeng]
* dc8fd20f - [DASH] ACL tags implementation (#2915) (11 hours ago) [Oleksandr Ivantsiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 0ae5d2d2 - (HEAD -> master, origin/master, origin/HEAD) [ci] Use correct bullseye docker image according to source branch. (18 hours ago) [Liu Shilong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.
How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
With Debian Bookworm, Paramiko 2.9 or newer will need to be used to be
able to connect to devices running that version of Debian
(specifically, to those running OpenSSH 9.2).
Paramiko is currently on 3.3.1. For now, upgrade to 2.9.5.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-sairedis
```
* eaa2bda - (HEAD -> master, origin/master, origin/HEAD) Update SAI submodule to latest (#1311) (12 hours ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog