What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's
How I did:
- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.
How I verify:
Manual Verification
UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
- Why I did it
Added YANG model as part of Generic Hash feature development
- How I did it
Added YANG model
- How to verify it
1. Add UT
2. Verified manually with the feature qualification
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE
- How I did it
Initialized empty GRUB environment file after ONiE installation
- How to verify it
1. Install image from ONiE
2. Run BIOS firmware upgrade
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
- Why I did it
Support running hw-management service on MSN4700 emulation platform.
- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it
- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
In order to activate FW after it was upgraded need to perform reboot.
If reboot wasn't performed and user need to upgrade to another SONiC image then it will fail.
The reason for that is that during SONiC upgrade new FW should be installed but it will fail because previously installed FW wasn't activated.
In order to allow 2nd FW upgrade without reboot in-between need to reactivate FW image.
This change handles such flow.
Example of issue scenario:
User installed SONiC image on the switch
Then for some reason FW was upgraded by user or script but reboot was not performed to activate it.
After that upgrade to new SONiC image will fail because new image need to install FW but it fails due to previous one wasn't activated.
- How I did it
In "mlnx-fw-upgrade" script check if FW upgrade failed with the error that FW was already installed but reboot was not performed.
If so then perform FW image reactivation and try to upgrade FW again.
- How to verify it
Install SONiC image on the switch
Then upgrade FW but don't perform reboot.
After that upgrade to new SONiC image and check that upgrade was successfull.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
- Convert hw-dump into generate-dump plugins
- Enable DRAM scrubber on some products
- Fix xcvr driver active low register bit logic
- Improve cooling algorithm (now considers xcvrs and modules)
- Add linecard graceful shutdown (disabled by default)
The scrubber was enabled for the following products:
- DCS-7050QX-32S
- DCS-7050CX3-32S
- DCS-7060CX-32S
What I did:
Revert the GTSM feature for VOQ iBGP session done as part of #16777.
Why I did:
On VOQ chassis BGP packets go over Recycle Port and then for Ingress Pipeline Routing making ttl as 254 and failing single hop check.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Sub PRs:
sonic-net/sonic-host-services#84
#17191
Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.
Microsoft ADO (number only): 25072889
How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.
Signed-off-by: Ze Gan <ganze718@gmail.com>
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.
E.G. of a few failures seen when insufficient shmem was set
ha_init: The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.
Syncd hangs here:
syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.
Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
#### Why I did it
src/sonic-dbsyncd
```
* e294eb0 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#63) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* 55a6828 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#406) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Work item tracking
Microsoft ADO (number only): 25858445
How I did it
sonic-mgmt-docker with both Python2 and Python3 tag is latest
sonic-mgmt-docker with Python3 only tag is py3only
How to verify it
Why I did it
Add config_db monitor and customize options for dhcpservd. HLD: sonic-net/SONiC#1282
Work item tracking
Microsoft ADO (number only): 25600859
How I did it
Add support to customize unassigned DHCP options. Current support type: binary, boolean, ipv4-address, string, uint8, uint16, uint32
Add db config change monitor for dhcpservd
How to verify it
Unit tests in sonic-dhcp-server all passed
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)
Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag
How to verify it
Run snmp sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
admin@vlab-01:~$ docker inspect snmp | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker exec -it snmp bash
root@vlab-01:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_se
#### Why I did it
src/sonic-mgmt-common
```
* faa2a51 - (HEAD -> master, origin/master, origin/HEAD) Go Code format checker and formatter (#112) (8 hours ago) [faraazbrcm]
* faaa9f5 - PathInfo optimizations (#115) (22 hours ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 30fb0ce - (HEAD -> master, origin/master, origin/HEAD) Implement is_copper for SFP (#414) (12 hours ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fix#16204
Microsoft ADO (number only): 25746782
How I did it
multiarch/debian-debootstrap:arm64-bullseye is too old.
It needs to add some gpg keys before 'apt-get update'
In the ubuntu environment, the debian server key wasn't installed by default. So, we will get the following error in the Azp pipeline
gpg: WARNING: no command supplied. Trying to guess what you mean ...
gpg: Signature made Sun Apr 9 06:25:32 2023 UTC
gpg: using RSA key 7D887DC8BA7BBBA7B835E3BADCE310E7864CC8BF
gpg: Can't check signature: No public key
gpg: can't create `/home/vsts/.gnupg/random_seed': No such file or directory
Validation FAILED!!
Signed-off-by: Ze Gan <ganze718@gmail.com>
Why I did it
Work item tracking
Microsoft ADO (number only): 25858445
How I did it
docker-sonic-mgmt.yml will build docker with Python2 and Python3 both.
docker-sonic-mgmt-py3-only.yml will build docker with Python3 only.
#### Why I did it
src/sonic-platform-common
```
* 5cc3e30 - (HEAD -> master, origin/master, origin/HEAD) Correct wrong constant (#411) (6 hours ago) [ChiouRung Haung]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-mgmt-common
```
* 7e3a8ad - (HEAD -> master, origin/master, origin/HEAD) Transformer infra enhancements and bug fixes (#104) (5 days ago) [amrutasali]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-host-services
```
* 586b1e9 - (HEAD -> master, origin/master, origin/HEAD) Disable systemd auto-restart of dependent services for spineRouters (#83) (5 hours ago) [Deepak Singhal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
New introduced MSN2700 platform has a different platform name compared to the old one, it should be "MSN2700-A1".
- How I did it
Update the name to the new one in platform.json and platform_components.json.
- How to verify it
run platform-related sonic-mgmt test cases on the new platform.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
PR checker is blocked by container_checker.
- How I did it
Disable telemetry in minigraph parser.
- How to verify it
Run pipeline and sanity check.
#### Why I did it
src/sonic-swss
```
* 644b227a - (HEAD -> master, origin/master, origin/HEAD) [portsorch]: Implement port PFC asym capability check (#2942) (3 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-mgmt-framework
```
* dfac87c - (HEAD -> master, origin/master, origin/HEAD) Query parameters enhancements in rest-server. (#119) (5 weeks ago) [ranjinidn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/dhcprelay
```
* 40c6877 - (HEAD -> master, origin/master, origin/HEAD) [CodeQL] fix unmet dependency for `build-swss-common` (#44) (30 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Back port a patch from upstream FRR - FRRouting/frr#14675
Why I did it
The EVPN route is not treated correctly and thus leading to messages:
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 20.0.0.2/32, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 200.0.0.0/24, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 200::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::2/128, resolving neighbor
This happens because fpmsyncd does not get encap type field in FPM message.
Work item tracking
Microsoft ADO (number only):
How I did it
Backport fix from FRR.
How to verify it
EVPN scenario.
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.
How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces
How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Orchagent uses PORTCHANNEL term when parsing this field. Change the YANG model to align to orchagent.
- Why I did it
When specifying PORTCHANNEL in ACL_TABLE_TYPE table YAGN model validation does not pass, when using term LAG orchagent does not accept such table type.
Fix it by aligning YANG model to orchagent.
- How I did it
Fix in YANG model.
- How to verify it
Create custom ACL table type.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it
src/sonic-sairedis
```
* 7acd028 - (HEAD -> master, origin/master, origin/HEAD) [gbsyncd] Add asic db prefix for channel RESTARTQUERY (#1302) (3 hours ago) [Junhua Zhai]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 51bfb4c1 - (HEAD -> master, origin/master, origin/HEAD) [muxorch] Fixing updateRoute logic (#2952) (3 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fix#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.
This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
Related PRs:
sonic-net/sonic-utilities#2972sonic-net/sonic-sairedis#1288sonic-net/sonic-sairedis#1298
How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.
1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.
2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:
For each ASIC, such as ASIC0,
3.1. Config Redis consistency directory.
redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null
3.2. Save the Redis data.
redis-cli -h $hostname -p $port SAVE > /dev/null
3.3. Run rdb command to convert the dump files into JSON files
rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null
3.4. Run saidump -r to update the JSON files' format as same as the saidump before.
Then we can get the saidump's result in standard output."
saidump -r $redis_dir/dump.json -m 100
3.5. Clear the temporary files.
rm -f $redis_dir/dump.rdb
rm -f $redis_dir/dump.json
4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
#### Why I did it
src/sonic-swss
```
* 2b02c249 - (HEAD -> master, origin/master, origin/HEAD) Send hearbeat during warm reboot freese (#2923) (81 minutes ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Share docker image to support gnmi container and telemetry container
Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.
How to verify it
Run end to end test.
Why I did it
Fix#17097
If I set SONIC_VERSION_CONTROL_COMPONENTS=all and MIRROR_SNAPSHOT=y in rules/config file then I get incorrect sources.list files (with latest available snapshots instead of snapshot from files/build/versions/default/versions-mirror).
Work item tracking
Microsoft ADO (number only):
How I did it
Pass directly make variable SONIC_VERSION_CONTROL_COMPONENTS to subshell.
How to verify it
Build and check generated sources.list files.
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.