Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.
How I did it
Setting loglevel attribute in crashkernel cmdline
How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
#### Why I did it
src/sonic-swss
```
* 1d651e8b - (HEAD -> 202305, origin/202305) Fix the Orchagent crash seen during Port channel OC test cases. (#3042) (2 days ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* e2ac2b63 - (HEAD -> 202305, origin/202305) [Bug] Fix fw_setenv illegel character issue (#3201) (2 days ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fix the build broken issue:
Processing /sonic_host_services-1.0-py3-none-any.whl
Requirement already satisfied: dbus-python in /usr/lib/python3/dist-packages (from sonic-host-services==1.0) (1.2.16)
Requirement already satisfied: systemd-python in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (235)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (3.1.2)
Collecting PyGObject (from sonic-host-services==1.0)
Downloading pygobject-3.48.0.tar.gz (714 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 714.2/714.2 kB 13.1 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'error'
error: subprocess-exited-with-error
Work item tracking
Microsoft ADO (number only): 27124786
How I did it
Install the pygobject before installing the sonic_host_services.
If installing during the .,whl, it will try to install the latest version (3.48.0), then it will have an issue. Prefer to use the version 3.46.0, see
sonic-buildimage/files/build/versions/host-image/versions-py3
Line 55 in a6437d8
pygobject==3.46.0
It will not add a new package, only install the depended packages firstly.
* [Security] Fix the krb5 vulnerability issue (#17914)
### Why I did it
Fix the krb5 vulnerable issue
CVE-2021-36222 allows remote attackers to cause a NULL pointer dereference and daemon crash
CVE-2021-37750 NULL pointer dereference in kdc/do_tgs_req.c via a FAST inner body that lacks a server field
DSA 5286-1 remote code execution
##### Work item tracking
- Microsoft ADO **(number only)**: 26577929
#### How I did it
Upgrade the krb5 version to 1.18.3-6+deb11u14+fips.
* [Build] Fix krb5 package not found issue (#17926)
Why I did it
Fix the build issue caused by the wrong version specified.
See the build error logs:
Try 4: /usr/bin/wget --retry-connrefused failed to get: -O
--2024-01-26 11:38:23-- https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.10/amd64/libk5crypto3_1.18.3-6+deb11u14+fips_amd64.deb
Resolving sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)... 20.60.59.131
Connecting to sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)|20.60.59.131|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-01-26 11:38:23 ERROR 404: The specified blob does not exist..
Try 5: /usr/bin/wget --retry-connrefused failed to get: -O
make[1]: *** [Makefile:12: /sonic/target/debs/bullseye/symcrypt-openssl_0.10_amd64.deb] Error 8
make[1]: Leaving directory '/sonic/src/sonic-fips'
Work item tracking
Microsoft ADO (number only): 26577929
The package not installed but PR passed issue is traced in another issue #17927
How I did it
Add the libkrb5-dev and the depended packages to fix docker-sonic-vs build failure.
The package libzmq3-dev has dependency on the libkrb5-dev.
* [202305] Support FIPS for armhf
* Remove no use mirror
* Fix fips options issue
Why I did it
FRR issue
FRRouting/frr#15459
Add patch for FRR PR for 202305
FRRouting/frr#15466
Work item tracking
Microsoft ADO (number only):
26876083
How I did it
Add patch for FRR PR
FRRouting/frr#15466
How to verify it
Run the original case locally
#### Why I did it
src/sonic-platform-daemons
```
* 72e9a28 - (HEAD -> 202305, origin/202305) Enable periodic polling of TRANSCEIVER_FIRMWARE_INFO table in DomInfoUpdateTask (#443) (#445) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 49af6dfb - (HEAD -> 202305, origin/202305) Several fixes and updates for ecnconfig (#2889) (22 hours ago) [rbpittman]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Reverting DHCP counter changes due to unexpected packet drops seen in recv buffer, causing counter counts to be inaccurate in dhcpmon and affecting dhcp6relay performance
Work item tracking
Microsoft ADO (number only): 26918588
How I did it
Reset submodule head and revert related dockerfile changes
How to verify it
Ran mgmt test and stress test
Why I did it
deb11u1 is deprecated.
Use deb11u2 instead.
Other branches are not impacted, because their reproducible build version files are up to date.
Work item tracking
Microsoft ADO (number only): 26964185
How I did it
How to verify it
#### Why I did it
src/sonic-platform-common
```
* 29544ed - (HEAD -> 202305, origin/202305) Certain VDM fields not populating after encountering KeyError on 400ZR optics (#442) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* e16fed53 - (HEAD -> 202305, origin/202305) Modify transceiver PM CLI to handle N/A value for DOM threshold (#3174) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
### Why I did it
Fix flakiness of eventd UT - run sub after capture service starts
##### Work item tracking
- Microsoft ADO **(number only)**:25650744
#### How I did it
Run sub socket after capture socket is initialized
#### How to verify it
Pipeline
#### Why I did it
src/sonic-swss
```
* 8d6aac03 - (HEAD -> 202305, origin/202305) [intfsorch] Enable ipv6 proxy ndp along with proxy arp (#3045) (2 days ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 514f4329 - (HEAD -> 202305, origin/202305) Fix sfputil CLI failure for multi-asic platforms (#3168) (#3181) (3 hours ago) [longhuan-cisco]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fixed DhcpV6 CoPP issue. In certain scenarios dhcpv6 packet ff02::1:2,udp=17,l4-dst-port=547 was not trapped to CPU.
Fixed test_copp.py::test_add_new_trap and test_remove_trap PTF failures
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Why I did it
ICM reported due to "BGPMon Process exited" which was caused by json load exception.
Work item tracking
Microsoft ADO (number only):
25916773
How I did it
Add an exception handle during json load.
How to verify it
Verified locally, add debug log to modify the output string of cmd to make it not with json formation, then check the syslog.
#### Why I did it
src/sonic-swss
```
* 5d91f105 - (HEAD -> 202305, origin/202305) Allow L4 port range egress ACL rules on DNX (#3014) (2 days ago) [arista-nwolfe]
```
#### How I did it
#### How to verify it
#### Description for the changelog
These changes adjust Nokia IXR7250 thermal sensor logging thresholds.
Why I did it
To modify the thermal sensor logging thresholds used on LC and Supervisor.
How I did it
Modified the JSON based thermal logging thresholds used to determine when to log current high sensor temperature and hottest sensor margin fluctuations.
How to verify it
Verify that syslog messages indicating current (high) temperature and margin values are only logged when these respective values fluctuate by at least 5 degrees.
Co-authored-by: snider-nokia <76123698+snider-nokia@users.noreply.github.com>
Why I did it
Disable Bad Link Detection feature in SDK
Fix to address pmbus driver errors causing “show platform psu status” not showing power
Work item tracking
How I did it
How to verify it
#### Why I did it
src/sonic-sairedis
```
* 6a018ae - (HEAD -> 202305, origin/202305) Install nlohmann-json3-dev for CodeQL analysis of (#1350) (3 days ago) [JunhongMao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 1c5c134b - (HEAD -> 202305, origin/202305) Add all SKUs to the generic config update list (#3131) (4 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-gnmi
```
* 50849ce - (HEAD -> 202305, origin/202305) Replace PFC_WD_TABLE with PFC_WD (#173) (22 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* b89740b - (HEAD -> 202305, origin/202305) Updated SAI module for 202305 branch to latest v1.12 SAI (#1356) (2 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 6616cb5 - (HEAD -> 202305, origin/202305) Skip FABRIC PORT Attributes from sairedis logging (#1339) (2 days ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* b6f8a8d - (HEAD -> 202305, origin/202305) Fix memory map parsing issue (#427) (22 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
ix IPV6 forced-mgmt-route not work issue
Why I did it
IPV6 forced-mgmt-route not work
When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.
Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue.
Microsoft ADO (number only):24719238
#### Why I did it
src/sonic-utilities
```
* c5f53423 - (HEAD -> 202305, origin/202305) Fix `sudo config load_mgmt_config` fails with error "File /var/run/dhclient.eth0.pid does not exist" (#3149) (16 hours ago) [Mai Bui]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* a64276a - (HEAD -> 202305, origin/202305) Tx/Rx power values should be rounded up to 3 decimal places (#432) (22 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
The PR introduced a bug for slim image build, #17905, by which the sonic_asic_platform is missing when build docker image for slim image.
[ building ] [ target/docker-dhcp-relay.gz ]
/sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
/sonic
Traceback (most recent call last):
File "/usr/local/bin/j2", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 202, in main
output = render_command(
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 186, in render_command
result = renderer.render(args.template, context)
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 85, in render
return self._env \
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "/sonic/dockers/docker-dhcp-relay/Dockerfile.j2", line 48, in top-level template code
{% if build_reduce_image_size != "y" or sonic_asic_platform != "broadcom" %}
jinja2.exceptions.UndefinedError: 'sonic_asic_platform' is undefined
make: *** [slave.mk:1072: target/docker-dhcp-relay.gz] Error 1
make: *** Waiting for unfinished jobs....
[ finished ] [ target/docker-swss-layer-bullseye.gz ]
[ finished ] [ target/docker-syncd-brcm-dnx.gz ]
make[1]: *** [Makefile.work:608: target/sonic-broadcom.bin] Error 2
make[1]: Leaving directory '/data/work/1/s'
make: *** [Makefile:41: target/sonic-broadcom.bin] Error 2
And why it slipped the PR test? PR test doesn't compile with slim option, it won't check sonic_asic_platform != "broadcom" for PR build.
Work item tracking
Microsoft ADO (number only):
How I did it
Export sonic_asic_platform for docker build in slave.mk
How to verify it
build with slim image option.
*use lower case for IPv6 address as internal key and bfd session key. fixes#15764
Why I did it
*staticroutebfd uses the IPv6 address string as a key to create bfd session and cache the bfd sessions using it as a key.
When the IPv6 address string has uppercase letter in the static route nexthop list, the string with uppercase letter key is stored in the cache, but the BFD STATE_DB uses lowercase for IPv6 address, so when the staticroutebfd get the bfd state event, it cannot find the bfd session in its local cache because of the letter case.
Why I did it
Upgrade the xgs SAI version to 8.4.41.1 to include the following fix:
8.4.41.1: Cherry-pick from SAI 4.3: CS00012288297: Fix TX queue for control packets
Work item tracking
Microsoft ADO (number only): 26626208
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
run test_bgp_queue.py test on 7050qx T1: https://dev.azure.com/mssonic/internal/_build/results?buildId=467287&view=results
Why I did it
Advance dhcpmon submodule head
Work item tracking
Microsoft ADO (number only): 26270786
How I did it
fc20a97 Yaqiang Zhu Wed Jan 10 09:11:25 2024 +0800 [202311][counter] Clear counter table when dhcpmon init (#14)
bace2e0 Yaqiang Zhu Fri Jan 5 11:29:21 2024 +0800 [counter] Clear counter table when dhcpmon init (#14)
How to verify it
Disable eventd at buildtime for slim images
- Microsoft ADO **(number only)**:26386286
Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image
Manual testing
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.
Syslog error message examples:
Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:
[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:
https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous
- How I did it
Add a kernel parameter to tell libata to disable NCQ
- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
Why I did it
Fix an error in the log_err call.
this error can be triggered by an invalid static route key. usually the code cannot go here with normal config file. but hit this issue with an invalid key by manual testing with redis-cli directly. the file is scanned by Python lint to prevent such errors.
Work item tracking
Microsoft ADO ():26250268
How I did it
fix the format error.
How to verify it
1, ran pylint to check the design, make sure no such error in the design file.
2, wrote a separate python program to verify the log call.
In the current logging related testing, usually use patch/mock for logging. for this specific error, could not trigger it if we call mock function instead the real function in the design. so need to do lint checking for code change.
Why I did it
Update SDK/FW version to 4.6.2202/2012.2202
Fixed issues
On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place
Work item tracking
Microsoft ADO (number only):
How I did it
Updating make files.
How to verify it
Running regression
Why I did it
Upgrade the xgs SAI version to 8.4.41.0 to include the following fix:
8.4.39.3: Revert "Merged PR 4452: Update SAI version to 8.4.39.2 to include fix capability for Hostif queue"
8.4.40.0: [sbumodule upgrade][CS00012330252] ACL entry programming takes longer in SAI version 8.4 compared to SAI version 7.1
8.4.41.0: [CS00012330251]Extra buffer Profiles created internally are seen as regular profiles in Get calls
Work item tracking
Microsoft ADO (number only): 26609411
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
8.4.40.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=465899&view=results
8.4.41.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=466690&view=results
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
#### Why I did it
When set TACACS to "tacacs+, local", user still can run a blocked command with local permission.
##### Work item tracking
- Microsoft ADO: 26399545
#### How I did it
Fix code to reject command when authorized failed from TACACS server side.
#### How to verify it
Pass all UT.
### Description for the changelog
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.