Why I did it
Reverting DHCP counter changes due to unexpected packet drops seen in recv buffer, causing counter counts to be inaccurate in dhcpmon and affecting dhcp6relay performance
Work item tracking
Microsoft ADO (number only): 26918588
How I did it
Reset submodule head and revert related dockerfile changes
How to verify it
Ran mgmt test and stress test
Why I did it
deb11u1 is deprecated.
Use deb11u2 instead.
Other branches are not impacted, because their reproducible build version files are up to date.
Work item tracking
Microsoft ADO (number only): 26964185
How I did it
How to verify it
#### Why I did it
src/sonic-platform-common
```
* 29544ed - (HEAD -> 202305, origin/202305) Certain VDM fields not populating after encountering KeyError on 400ZR optics (#442) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* e16fed53 - (HEAD -> 202305, origin/202305) Modify transceiver PM CLI to handle N/A value for DOM threshold (#3174) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
### Why I did it
Fix flakiness of eventd UT - run sub after capture service starts
##### Work item tracking
- Microsoft ADO **(number only)**:25650744
#### How I did it
Run sub socket after capture socket is initialized
#### How to verify it
Pipeline
#### Why I did it
src/sonic-swss
```
* 8d6aac03 - (HEAD -> 202305, origin/202305) [intfsorch] Enable ipv6 proxy ndp along with proxy arp (#3045) (2 days ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 514f4329 - (HEAD -> 202305, origin/202305) Fix sfputil CLI failure for multi-asic platforms (#3168) (#3181) (3 hours ago) [longhuan-cisco]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fixed DhcpV6 CoPP issue. In certain scenarios dhcpv6 packet ff02::1:2,udp=17,l4-dst-port=547 was not trapped to CPU.
Fixed test_copp.py::test_add_new_trap and test_remove_trap PTF failures
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Why I did it
ICM reported due to "BGPMon Process exited" which was caused by json load exception.
Work item tracking
Microsoft ADO (number only):
25916773
How I did it
Add an exception handle during json load.
How to verify it
Verified locally, add debug log to modify the output string of cmd to make it not with json formation, then check the syslog.
#### Why I did it
src/sonic-swss
```
* 5d91f105 - (HEAD -> 202305, origin/202305) Allow L4 port range egress ACL rules on DNX (#3014) (2 days ago) [arista-nwolfe]
```
#### How I did it
#### How to verify it
#### Description for the changelog
These changes adjust Nokia IXR7250 thermal sensor logging thresholds.
Why I did it
To modify the thermal sensor logging thresholds used on LC and Supervisor.
How I did it
Modified the JSON based thermal logging thresholds used to determine when to log current high sensor temperature and hottest sensor margin fluctuations.
How to verify it
Verify that syslog messages indicating current (high) temperature and margin values are only logged when these respective values fluctuate by at least 5 degrees.
Co-authored-by: snider-nokia <76123698+snider-nokia@users.noreply.github.com>
Why I did it
Disable Bad Link Detection feature in SDK
Fix to address pmbus driver errors causing “show platform psu status” not showing power
Work item tracking
How I did it
How to verify it
#### Why I did it
src/sonic-sairedis
```
* 6a018ae - (HEAD -> 202305, origin/202305) Install nlohmann-json3-dev for CodeQL analysis of (#1350) (3 days ago) [JunhongMao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 1c5c134b - (HEAD -> 202305, origin/202305) Add all SKUs to the generic config update list (#3131) (4 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-gnmi
```
* 50849ce - (HEAD -> 202305, origin/202305) Replace PFC_WD_TABLE with PFC_WD (#173) (22 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* b89740b - (HEAD -> 202305, origin/202305) Updated SAI module for 202305 branch to latest v1.12 SAI (#1356) (2 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 6616cb5 - (HEAD -> 202305, origin/202305) Skip FABRIC PORT Attributes from sairedis logging (#1339) (2 days ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* b6f8a8d - (HEAD -> 202305, origin/202305) Fix memory map parsing issue (#427) (22 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
ix IPV6 forced-mgmt-route not work issue
Why I did it
IPV6 forced-mgmt-route not work
When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.
Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue.
Microsoft ADO (number only):24719238
#### Why I did it
src/sonic-utilities
```
* c5f53423 - (HEAD -> 202305, origin/202305) Fix `sudo config load_mgmt_config` fails with error "File /var/run/dhclient.eth0.pid does not exist" (#3149) (16 hours ago) [Mai Bui]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* a64276a - (HEAD -> 202305, origin/202305) Tx/Rx power values should be rounded up to 3 decimal places (#432) (22 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
The PR introduced a bug for slim image build, #17905, by which the sonic_asic_platform is missing when build docker image for slim image.
[ building ] [ target/docker-dhcp-relay.gz ]
/sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
/sonic
Traceback (most recent call last):
File "/usr/local/bin/j2", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 202, in main
output = render_command(
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 186, in render_command
result = renderer.render(args.template, context)
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 85, in render
return self._env \
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "/sonic/dockers/docker-dhcp-relay/Dockerfile.j2", line 48, in top-level template code
{% if build_reduce_image_size != "y" or sonic_asic_platform != "broadcom" %}
jinja2.exceptions.UndefinedError: 'sonic_asic_platform' is undefined
make: *** [slave.mk:1072: target/docker-dhcp-relay.gz] Error 1
make: *** Waiting for unfinished jobs....
[ finished ] [ target/docker-swss-layer-bullseye.gz ]
[ finished ] [ target/docker-syncd-brcm-dnx.gz ]
make[1]: *** [Makefile.work:608: target/sonic-broadcom.bin] Error 2
make[1]: Leaving directory '/data/work/1/s'
make: *** [Makefile:41: target/sonic-broadcom.bin] Error 2
And why it slipped the PR test? PR test doesn't compile with slim option, it won't check sonic_asic_platform != "broadcom" for PR build.
Work item tracking
Microsoft ADO (number only):
How I did it
Export sonic_asic_platform for docker build in slave.mk
How to verify it
build with slim image option.
*use lower case for IPv6 address as internal key and bfd session key. fixes#15764
Why I did it
*staticroutebfd uses the IPv6 address string as a key to create bfd session and cache the bfd sessions using it as a key.
When the IPv6 address string has uppercase letter in the static route nexthop list, the string with uppercase letter key is stored in the cache, but the BFD STATE_DB uses lowercase for IPv6 address, so when the staticroutebfd get the bfd state event, it cannot find the bfd session in its local cache because of the letter case.
Why I did it
Upgrade the xgs SAI version to 8.4.41.1 to include the following fix:
8.4.41.1: Cherry-pick from SAI 4.3: CS00012288297: Fix TX queue for control packets
Work item tracking
Microsoft ADO (number only): 26626208
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
run test_bgp_queue.py test on 7050qx T1: https://dev.azure.com/mssonic/internal/_build/results?buildId=467287&view=results
Why I did it
Advance dhcpmon submodule head
Work item tracking
Microsoft ADO (number only): 26270786
How I did it
fc20a97 Yaqiang Zhu Wed Jan 10 09:11:25 2024 +0800 [202311][counter] Clear counter table when dhcpmon init (#14)
bace2e0 Yaqiang Zhu Fri Jan 5 11:29:21 2024 +0800 [counter] Clear counter table when dhcpmon init (#14)
How to verify it
Disable eventd at buildtime for slim images
- Microsoft ADO **(number only)**:26386286
Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image
Manual testing
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.
Syslog error message examples:
Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:
[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:
https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous
- How I did it
Add a kernel parameter to tell libata to disable NCQ
- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
Why I did it
Fix an error in the log_err call.
this error can be triggered by an invalid static route key. usually the code cannot go here with normal config file. but hit this issue with an invalid key by manual testing with redis-cli directly. the file is scanned by Python lint to prevent such errors.
Work item tracking
Microsoft ADO ():26250268
How I did it
fix the format error.
How to verify it
1, ran pylint to check the design, make sure no such error in the design file.
2, wrote a separate python program to verify the log call.
In the current logging related testing, usually use patch/mock for logging. for this specific error, could not trigger it if we call mock function instead the real function in the design. so need to do lint checking for code change.
Why I did it
Update SDK/FW version to 4.6.2202/2012.2202
Fixed issues
On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place
Work item tracking
Microsoft ADO (number only):
How I did it
Updating make files.
How to verify it
Running regression
Why I did it
Upgrade the xgs SAI version to 8.4.41.0 to include the following fix:
8.4.39.3: Revert "Merged PR 4452: Update SAI version to 8.4.39.2 to include fix capability for Hostif queue"
8.4.40.0: [sbumodule upgrade][CS00012330252] ACL entry programming takes longer in SAI version 8.4 compared to SAI version 7.1
8.4.41.0: [CS00012330251]Extra buffer Profiles created internally are seen as regular profiles in Get calls
Work item tracking
Microsoft ADO (number only): 26609411
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
8.4.40.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=465899&view=results
8.4.41.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=466690&view=results
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
#### Why I did it
When set TACACS to "tacacs+, local", user still can run a blocked command with local permission.
##### Work item tracking
- Microsoft ADO: 26399545
#### How I did it
Fix code to reject command when authorized failed from TACACS server side.
#### How to verify it
Pass all UT.
### Description for the changelog
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
#### Why I did it
src/sonic-utilities
```
* c5e30e38 - (HEAD -> 202305, origin/202305) [202305] Enhanced route_check.py for multi_asic platforms (#3112) (21 hours ago) [Deepak Singhal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
What I did:
Added support when TSA is done on Line Card make sure it's completely
isolated from all e-BGP peer devices from this LC or remote LC
Why I did:
Currently when TSA is executed on LC routes are withdrawn from it's connected e-BGP peers only. e-BGP peers on remote LC can/will (via i-BGP) still have route pointing/attracting traffic towards this isolated LC.
How I did:
When TSA is applied on LC all the routes that are advertised via i-BGP are set with community tag of no-export so that when remote LC received these routes it does not send over to it's connected e-BGP peers.
Also once we receive the route with no-export over iBGP match on it and and set the local preference of that route to lower value (80) so that we remove that route from the forwarding database. Below scenario explains why we do this:
- LC1 advertise R1 to LC3
- LC2 advertise R1 to LC3
- On LC3 we have multi-path/ECMP over both LC1 and LC2
- On LC3 R1 received from LC1 is consider best route over R1 over received from LC2 and is send to LC3 e-BGP peers
- Now we do TSA on LC2
- LC3 will receive R1 from LC2 with community no-export and from LC1 same as earlier (no change)
- LC3 will still get traffic for R1 since it is still advertised to e-BGP peers (since R1 from LC1 is best route)
- LC3 will forward to both LC1 and LC2 (ecmp) and this causes issue as LC2 is in TSA mode and should not receive traffic
To fix above scenario we change the preference to lower value of R1 received from LC2 so that it is removed from Multi-path/ECMP group.
How I verfiy:
UT has been added to make sure Template generation is correct
Manual Verification of the functionality
sonic-mgmt test case will be updated accordingly.
Please note this PR is on top of this :#16714 which needs to be merged first.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:
Jan 8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist
Jan 8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist
- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
Why I did it
Align the keywords to make qos configuration take effect
Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI
How to verify it
reload minigraph and check the qos configuration
#### Why I did it
src/sonic-platform-daemons
```
* 824c20a - (HEAD -> 202305, origin/202305) Support 800G ifname in xcvrd (#420) (20 hours ago) [Anoop Kamath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 83a548de - (HEAD -> 202305, origin/202305) Disable Key Validation feature during sonic-installation for Cisco Platforms (#3115) (22 hours ago) [selvipal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* 6f59d29 - (HEAD -> 202305, origin/202305) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (#309) (33 minutes ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* 2efaf2e - (HEAD -> 202305, origin/202305) Revert "[action] [PR:303] Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303)" (#308) (4 minutes ago) [StormLiangMS]
```
#### How I did it
#### How to verify it
#### Description for the changelog