Commit Graph

8273 Commits

Author SHA1 Message Date
mssonicbld
2e32cba321
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17230) 2023-11-21 15:08:09 +08:00
Mai Bui
6ea03f9f78
[docker-restapi] limit privileged flag for restapi container (#17138)
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)

Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag

How to verify it
Run restapi sonic-mgmt tests on sn4600c
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
2023-11-21 14:50:31 +08:00
jfeng-arista
6dfaf5e293
[sonic-vs]: Add fabric port data for vs test, and start fabricmgrd in vs environment (#16791)
Add fabric port data for vs test, and start fabricmgrd in vs environment.

This PR depends on sonic-net/sonic-sairedis#1301

sonic-net/sonic-swss#2920 needs this one merge first.
2023-11-20 16:21:03 -08:00
Pavan Naregundi
307e39bde4
[Marvell-arm64] Add platform support for rd98DX35xx (#16874)
* [Marvell-arm64] Add platform support for rd98DX35xx

This change adds following two variants of rd98DX35xx board to arm64
build.

Board with CPU integrated into the 98DX35xx switching chip:

 Platform: arm64-marvell_rd98DX35xx-r0
 HwSKU: rd98DX35xx
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Board with external CN9131 CPU connected over PCI to 98DX35xx
switching chip:

 Platform: arm64-marvell_rd98DX35xx_cn9131-r0
 HwSKU: rd98DX35xx_cn9131
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Change-Id: I21dc9fe972417daaabb20a5bddf7779d72b7972e
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

* Add HWSKU for rd98DX35xx and rd98DX35xx_cn9131

This patch adds new HWSKU's for Marvell arm64 platforms rd98DX35xx
and rd98DX35xx_cn9131.

Change-Id: Id7c14f49f0e304335cc4ca73dcae52362c49d231
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

---------

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-11-20 09:43:02 -08:00
abdosi
4a7aa2634f
[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714)
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's

How I did:

- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.

How I verify:

Manual Verification

UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-20 09:42:02 -08:00
Nazarii Hnydyn
c99ec1f80a
[hash] Add ECMP/LAG Hash Algorithm YANG model (#17079)
- Why I did it
Added YANG model as part of Generic Hash feature development

- How I did it
Added YANG model

- How to verify it
1. Add UT
2. Verified manually with the feature qualification

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-20 17:43:58 +02:00
Nazarii Hnydyn
c43ea1c904
[installer] Create a blank grubenv if doesn't exist. (#17216)
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE

- How I did it
Initialized empty GRUB environment file after ONiE installation

- How to verify it
1. Install image from ONiE
2. Run BIOS firmware upgrade

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-20 17:33:39 +02:00
Stephen Sun
b93852d53d
[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584)
- Why I did it
Support running hw-management service on MSN4700 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it

- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-11-19 11:03:46 +02:00
Volodymyr Samotiy
672781e24a
[mlnx-fw-upgrade] Add FW reactivation in case 2 FW upgrades were done without reboot (#17092)
- Why I did it
In order to activate FW after it was upgraded need to perform reboot.
If reboot wasn't performed and user need to upgrade to another SONiC image then it will fail.
The reason for that is that during SONiC upgrade new FW should be installed but it will fail because previously installed FW wasn't activated.
In order to allow 2nd FW upgrade without reboot in-between need to reactivate FW image.
This change handles such flow.

Example of issue scenario:

User installed SONiC image on the switch
Then for some reason FW was upgraded by user or script but reboot was not performed to activate it.
After that upgrade to new SONiC image will fail because new image need to install FW but it fails due to previous one wasn't activated.

- How I did it
In "mlnx-fw-upgrade" script check if FW upgrade failed with the error that FW was already installed but reboot was not performed.
If so then perform FW image reactivation and try to upgrade FW again.

- How to verify it
Install SONiC image on the switch
Then upgrade FW but don't perform reboot.
After that upgrade to new SONiC image and check that upgrade was successfull.

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-11-19 11:01:31 +02:00
Samuel Angebault
c2899eb44c
[Arista] Update platform library submodules (#16701)
Why I did it

- Convert hw-dump into generate-dump plugins
- Enable DRAM scrubber on some products
- Fix xcvr driver active low register bit logic
- Improve cooling algorithm (now considers xcvrs and modules)
- Add linecard graceful shutdown (disabled by default)

The scrubber was enabled for the following products:

- DCS-7050QX-32S
- DCS-7050CX3-32S
- DCS-7060CX-32S
2023-11-17 17:15:39 -08:00
abdosi
e37b4f3cfa
Revert iBGP GTSM feature for VOQ Chassis (#17037)
What I did:

Revert the GTSM feature for VOQ iBGP session done as part of #16777.

Why I did:
On VOQ chassis BGP packets go over Recycle Port and then for Ingress Pipeline Routing making ttl as 254 and failing single hop check.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-17 17:03:37 -08:00
saksarav-nokia
534eed9de7
[Nokia][Nokia-IXR7250E-SUP-10] Update BCM config for supervisor card to reduce the CPU usage (#16790)
Disabled the bcmCNTR thread to reduce the CPU usage for Nokia SFM cards.

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
2023-11-17 15:11:05 -08:00
Ze Gan
9f08f88a0d
[dpu]: Add DPU database service (#17161)
Sub PRs:

sonic-net/sonic-host-services#84
#17191

Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.

Microsoft ADO (number only): 25072889

How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-17 09:10:03 -08:00
arista-nwolfe
00a9412880
[Arista]: Set SYNCD_SHM_SIZE for Arista DNX Devices (#17205)
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.

E.G. of a few failures seen when insufficient shmem was set

ha_init:  The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.

Syncd hangs here:

syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.

Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
2023-11-17 09:06:25 -08:00
mssonicbld
e4878ff1ad
[submodule] Update submodule sonic-dbsyncd to the latest HEAD automatically (#17207)
#### Why I did it
src/sonic-dbsyncd
```
* e294eb0 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#63) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-17 16:33:54 +08:00
mssonicbld
ff435ec6cf
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17209)
#### Why I did it
src/sonic-platform-daemons
```
* 55a6828 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#406) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-17 16:33:46 +08:00
mssonicbld
3393b3069e
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#17213) 2023-11-17 15:25:54 +08:00
mssonicbld
e31c2c139a
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17190) 2023-11-17 15:10:17 +08:00
mssonicbld
713053398c
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17212) 2023-11-17 14:53:36 +08:00
ShiyanWangMS
63b2d68d70
First commit (#17199)
Why I did it
Work item tracking
Microsoft ADO (number only): 25858445
How I did it
sonic-mgmt-docker with both Python2 and Python3 tag is latest
sonic-mgmt-docker with Python3 only tag is py3only

How to verify it
2023-11-17 10:05:26 +08:00
Yaqiang Zhu
3223ca0156
[dhcp_server] Add config_db monitor and customize options for dhcpservd (#17051)
Why I did it
Add config_db monitor and customize options for dhcpservd. HLD: sonic-net/SONiC#1282

Work item tracking
Microsoft ADO (number only): 25600859
How I did it
Add support to customize unassigned DHCP options. Current support type: binary, boolean, ipv4-address, string, uint8, uint16, uint32
Add db config change monitor for dhcpservd
How to verify it
Unit tests in sonic-dhcp-server all passed
2023-11-16 08:56:50 -08:00
Mai Bui
682057945f
[docker-snmp] limit privileged flag for snmp container (#16971)
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)

Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag

How to verify it
Run snmp sonic-mgmt tests
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.

admin@vlab-01:~$ docker inspect snmp | grep Privi
            "Privileged": false,

admin@vlab-01:~$ docker exec -it snmp bash
root@vlab-01:/# capsh --print
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_se
2023-11-16 22:15:37 +08:00
mssonicbld
922a8ac45f
[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically (#17188)
#### Why I did it
src/sonic-mgmt-common
```
* faa2a51 - (HEAD -> master, origin/master, origin/HEAD) Go Code format checker and formatter (#112) (8 hours ago) [faraazbrcm]
* faaa9f5 - PathInfo optimizations (#115) (22 hours ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-16 18:36:35 +08:00
mssonicbld
672ea7d669
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17189)
#### Why I did it
src/sonic-platform-common
```
* 30fb0ce - (HEAD -> master, origin/master, origin/HEAD) Implement is_copper for SFP (#414) (12 hours ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-16 16:34:00 +08:00
Liu Shilong
27219fb61d
[build] Add gpg keys for sonic-slave-bullseye in arm64 cross build on amd64. (#17182)
Fix #16204

Microsoft ADO (number only): 25746782

How I did it
multiarch/debian-debootstrap:arm64-bullseye is too old.
It needs to add some gpg keys before 'apt-get update'
2023-11-15 23:58:39 -08:00
Ze Gan
8a95bff4e7
[protobuf]: Disable debian verification (#17168)
In the ubuntu environment, the debian server key wasn't installed by default. So, we will get the following error in the Azp pipeline

gpg: WARNING: no command supplied.  Trying to guess what you mean ...
gpg: Signature made Sun Apr  9 06:25:32 2023 UTC
gpg:                using RSA key 7D887DC8BA7BBBA7B835E3BADCE310E7864CC8BF
gpg: Can't check signature: No public key
gpg: can't create `/home/vsts/.gnupg/random_seed': No such file or directory
Validation FAILED!!

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-15 23:55:04 -08:00
ShiyanWangMS
a7695e0221
Add Python3-only sonic-mgmt-docker build pipeline file (#17187)
Why I did it
Work item tracking
Microsoft ADO (number only): 25858445
How I did it
docker-sonic-mgmt.yml will build docker with Python2 and Python3 both.
docker-sonic-mgmt-py3-only.yml will build docker with Python3 only.
2023-11-16 15:32:48 +08:00
mssonicbld
ac56563d60
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17176)
#### Why I did it
src/sonic-platform-common
```
* 5cc3e30 - (HEAD -> master, origin/master, origin/HEAD) Correct wrong constant (#411) (6 hours ago) [ChiouRung Haung]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-16 10:39:37 +08:00
Xichen96
e6c2e9ff94
add missing variable (#17162)
Without the INSTALL_PYTHON_WHEELS=SONIC_UTILITIES_PY3, slave container won't have needed sonic utilities related packages.
2023-11-15 10:20:39 -08:00
mssonicbld
a92ac0a851
[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically (#16744)
#### Why I did it
src/sonic-mgmt-common
```
* 7e3a8ad - (HEAD -> master, origin/master, origin/HEAD) Transformer infra enhancements and bug fixes (#104) (5 days ago) [amrutasali]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-15 16:34:27 +08:00
mssonicbld
6f9011c5d4
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#17174)
#### Why I did it
src/sonic-host-services
```
* 586b1e9 - (HEAD -> master, origin/master, origin/HEAD) Disable systemd auto-restart of dependent services for spineRouters (#83) (5 hours ago) [Deepak Singhal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-15 16:34:13 +08:00
mssonicbld
493724ce62
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17177) 2023-11-15 14:56:14 +08:00
Kebo Liu
8b62e7a5b2
[Mellanox] fix new MSN2700-A1 platform name (#17151)
- Why I did it
New introduced MSN2700 platform has a different platform name compared to the old one, it should be "MSN2700-A1".

- How I did it
Update the name to the new one in platform.json and platform_components.json.

- How to verify it
run platform-related sonic-mgmt test cases on the new platform.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-11-15 08:29:11 +02:00
mssonicbld
b33c38112c
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#17160) 2023-11-15 10:10:40 +08:00
ganglv
240853b7dd
Disable telemetry feature (#17166)
- Why I did it
PR checker is blocked by container_checker.

- How I did it
Disable telemetry in minigraph parser.

- How to verify it
Run pipeline and sanity check.
2023-11-14 15:25:03 +02:00
mssonicbld
1e93efaf93
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17142)
#### Why I did it
src/sonic-swss
```
* 644b227a - (HEAD -> master, origin/master, origin/HEAD) [portsorch]: Implement port PFC asym capability check (#2942) (3 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-14 16:33:10 +08:00
mssonicbld
fa05bf183a
[submodule] Update submodule sonic-mgmt-framework to the latest HEAD automatically (#16792)
#### Why I did it
src/sonic-mgmt-framework
```
* dfac87c - (HEAD -> master, origin/master, origin/HEAD) Query parameters enhancements in rest-server.  (#119) (5 weeks ago) [ranjinidn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-14 10:36:25 +08:00
ranjinidn
5567a79255
Update submodules mgmt-common and mgmt-framework (#17054) 2023-11-13 01:32:04 -08:00
mssonicbld
f3f0d403cb
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17147) 2023-11-13 15:56:49 +08:00
mssonicbld
73da758b84
[submodule] Update submodule dhcprelay to the latest HEAD automatically (#17140)
#### Why I did it
src/dhcprelay
```
* 40c6877 - (HEAD -> master, origin/master, origin/HEAD) [CodeQL] fix unmet dependency for `build-swss-common` (#44) (30 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-12 16:32:31 +08:00
Stepan Blyshchak
97db5f5b21
[FRR][patch] Add encap type when building packet for FPM (#17052)
Back port a patch from upstream FRR - FRRouting/frr#14675

Why I did it
The EVPN route is not treated correctly and thus leading to messages:

Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 20.0.0.2/32, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop 30.0.0.2@Vlan200 for 200.0.0.0/24, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 200::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::/64, resolving neighbor
Oct 30 11:40:00.494083 r-tigris-22 INFO swss#orchagent: :- addRoute: Failed to get next hop ::ffff:30.0.0.2@Vlan200 for 20::2/128, resolving neighbor
This happens because fpmsyncd does not get encap type field in FPM message.

Work item tracking
Microsoft ADO (number only):
How I did it
Backport fix from FRR.

How to verify it
EVPN scenario.
2023-11-11 21:26:14 +08:00
mssonicbld
d69a736bee
[submodule] Update submodule wpasupplicant/sonic-wpa-supplicant to the latest HEAD automatically (#17143) 2023-11-11 15:48:11 +08:00
mssonicbld
19cd92601c
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17141) 2023-11-11 15:31:57 +08:00
Lawrence Lee
04b30fc378
[tph]: Detect LAG flaps from APPL_DB (#16879)
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.

How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces

How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2023-11-09 16:01:59 -08:00
Junhua Zhai
4e3b2e5545
Upgrade libsaibroncos debian package to version 3.11 (#17127) 2023-11-09 10:15:02 -08:00
Stepan Blyshchak
113d7d8668
[YANG][ACL] Change LAG -> PORTCHANNEL in DB schema (#17062)
Orchagent uses PORTCHANNEL term when parsing this field. Change the YANG model to align to orchagent.

- Why I did it
When specifying PORTCHANNEL in ACL_TABLE_TYPE table YAGN model validation does not pass, when using term LAG orchagent does not accept such table type.
Fix it by aligning YANG model to orchagent.

- How I did it
Fix in YANG model.

- How to verify it
Create custom ACL table type.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-11-09 19:00:07 +02:00
xumia
7b6f7a6328
[Build] Deprecate the mirror packages.trafficmanager.net/debian (#17113)
Why I did it
Fix the issue: #17107

Work item tracking
Microsoft ADO (number only): 25746782
How I did it
Deprecate the no use and out of service mirrors.
http://packages.trafficmanager.net/debian/debian
http://packages.trafficmanager.net/debian/debian-security/
Enable the snapshot mirror by default if reproducible flag set.
How to verify it
2023-11-09 20:52:46 +08:00
mssonicbld
025d53c6d1
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17123)
#### Why I did it
src/sonic-sairedis
```
* 7acd028 - (HEAD -> master, origin/master, origin/HEAD) [gbsyncd] Add asic db prefix for channel RESTARTQUERY (#1302) (3 hours ago) [Junhua Zhai]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-09 16:32:51 +08:00
mssonicbld
4f04b95eeb
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17124)
#### Why I did it
src/sonic-swss
```
* 51bfb4c1 - (HEAD -> master, origin/master, origin/HEAD) [muxorch] Fixing updateRoute logic (#2952) (3 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-09 16:32:46 +08:00
JunhongMao
4da5099919
[VOQ][saidump] Install rdbtools into the docker base related containers. (#16466)
Fix #13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
2023-11-08 11:57:25 -08:00