src/sonic-platform-common
* 9a4a481 - (HEAD -> 202205, origin/202205) [202205] Fix issue: QSFP module with id 0x0d can be parsed using 8636 (#412) (#425) (11 days ago) [Stephen Sun]
Why I did it
Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image.
Work item tracking
Microsoft ADO (17271822):
How I did it
- Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo
- Show CLI changes after counter format change
How to verify it
- Manually run show command
- dhcpmon, dhcprelay integration tests
src/sonic-snmpagent
* a201154 - (HEAD -> 202205, origin/202205) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (#305) (29 hours ago) [mssonicbld]
Upgrade the SAI version to 7.1.73.4 to include the following fix:
7.1.67.4: [CSP CS00012302165] Backport JIRA SONIC-77116 to rel_ocp_sai_7_1
7.1.68.4: [CSP CS00012316299][SAI_BRANCH rel_ocp_sai_7_1] L3 entry delete failed when SER error is present
7.1.69.4: ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
7.1.70.4: JIRA# SONIC-79944
7.1.71.4: SID: MMU cosq control configuration with Dynamic Type Check
7.1.72.4: BACKPORT SONIC-81858 PFCWD on IPv6 JIRA# SONIC-84146]SONIC-81858
7.1.73.4: [CSP CS00012322843] SAI_API_ROUTE:brcm_sai_xgs_route_create:115 iptnl info get failed with error -7
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239
How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.
How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
Co-authored-by: Arun Saravanan Balachandran <52521751+ArunSaravananBalachandran@users.noreply.github.com>
* Revert "[202205] [Mellanox] Fix issue: user must set admin down before toggling LPM (#14370)"
This reverts commit f74c69e876.
* update copyright header
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
Today at most 128 LAGs are supported. This is not sufficient if there are many LAGs with just few ports.
How I did it
Increase LAG Ids to 1024 for DNX device.
Co-authored-by: Song Yuan <64041228+ysmanman@users.noreply.github.com>
What I did:
Revert the GTSM feature for VOQ iBGP session done as part of #16777.
Why I did:
On VOQ chassis BGP packets go over Recycle Port and then for Ingress Pipeline Routing making ttl as 254 and failing single hop check.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
#### Why I did it
src/sonic-swss
```
* fbab6b75 - (HEAD -> 202205, origin/202205) [Chassis][202205][orchagent] : Support WRED profiles on system ports (#2945) (9 hours ago) [vmittal-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Fix#13561
The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.
This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.
Related PRs:
sonic-net/sonic-utilities#2972sonic-net/sonic-sairedis#1288sonic-net/sonic-sairedis#1298
How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.
1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.
2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.
3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:
For each ASIC, such as ASIC0,
3.1. Config Redis consistency directory.
redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null
3.2. Save the Redis data.
redis-cli -h $hostname -p $port SAVE > /dev/null
3.3. Run rdb command to convert the dump files into JSON files
rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null
3.4. Run saidump -r to update the JSON files' format as same as the saidump before.
Then we can get the saidump's result in standard output."
saidump -r $redis_dir/dump.json -m 100
3.5. Clear the temporary files.
rm -f $redis_dir/dump.rdb
rm -f $redis_dir/dump.json
4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.
How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
Currently hostcfgd script overrides the systemd service files of the features depending upon auto_restart enable/disable.
I am skipping dependent features(syncd, gbsyncd for now) to have "RESTART=Always"
for them to not start immediately, and instead get started by SWSS through swss.sh script.
The issue of syncd double stop is also applicable to pizza box platforms, however no traffic impact is seen there, whereas on VOQ chassis, we do see traffic impact due to early start of syncd service.
Update the Brcm SAI 7.0 with following fixes
Offical Brcm SDK fix for memory leak
(CS00012315073 [7.0][J2C+] : PFCWD counter polling causing continuous mem leak on production device)
Official Brcm fix for CPU high
(CS00012317195 High CPU due to SDK calling soc_dnxc_port_resource_get for few stats counters even with bcmCNTR thread)
Offical Brcm SAI fix for getting voq counters working.
CSP CS00012319503: DNX SAI 7.1.60.4 has broken Voq counters support
How to verify it
Validated by running the nightly pipeline on a chassis platform.
Validated that the voq counters, by sensind traffic from T1 VM --> T3 VM
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ1 27 1968 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet48 VOQ7 0 0 0 0
Port Voq Counter/pkts Counter/bytes Drop/pkts Drop/bytes
---------------------------------- ----- -------------- --------------- ----------- ------------
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ0 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ1 7099 625680 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ2 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ3 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ4 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ5 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ6 0 0 0 0
svcstr-xxxx-lc1-1|asic0|Ethernet56 VOQ7 0 0 0 0
---------------
The CPU usage has come down in SUP
System 'xxxx-sup-1'
status Running
monitoring status Monitored
monitoring mode active
on reboot start
load average [7.94] [8.70] [7.54]
cpu 2.6%us 45.0%sy 0.0%wa <<<<-- it is 45%
memory usage 8.9 GB [28.6%]
swap usage 0 B [0.0%]
uptime 21m
boot time Fri, 17 Nov 2023 21:55:55
data collected Fri, 17 Nov 2023 22:16:59
-------------
syncd memory usage no increasing.
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.
How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces
How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>