#### Why I did it
src/sonic-swss
```
* d33b854e - (HEAD -> 202305, origin/202305) [acl] Add IN_PORTS qualifier for L3 table (#3078) (4 hours ago) [Neetha John]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Release notes for Cisco 8111-32EH-O
Fix for PFC not generated on priority 2 when P2 and P3 are used as lossless (MIGSMSFT-444)
Provide explicit mapping for unmapped DSCP packets (MIGSMSFT-358)
How I did it
Update platform version to 202305.1.0.10
#### Why I did it
src/sonic-linux-kernel
```
* 5508b25 - (HEAD -> 202305, origin/202305) Dynamic write timeout support for optoe driver (#370) (#386) (13 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* ed3ec1f - (HEAD -> 202305, origin/202305) [202305] Implementing set_optoe_write_timeout API (#423) (13 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* e6e34c1 - (HEAD -> 202305, origin/202305) Disable periodic polling of port in DomInfoUpdateTask thread during CMIS init (#449) (#450) (72 minutes ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* eef2d894 - (HEAD -> 202305, origin/202305) Update port2alias (#3217) (16 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Bugfix for Python syntax error in sonic_platform/common.py.
A method of class need to have self as parameter.
Fixing below issue:
e1031:~$ show int st
Traceback (most recent call last):
File "/usr/local/bin/intfutil", line 836, in <module>
main()
File "/usr/local/bin/intfutil", line 819, in main
interface_stat.display_intf_status()
File "/usr/local/bin/intfutil", line 448, in display_intf_status
self.get_intf_status()
File "/usr/local/lib/python3.9/dist-packages/utilities_common/multi_asic.py", line 157, in wrapped_run_on_all_asics
func(self, *args, **kwargs)
File "/usr/local/bin/intfutil", line 529, in get_intf_status
self.portchannel_speed_dict = po_speed_dict(self.po_int_dict, self.db)
File "/usr/local/bin/intfutil", line 334, in po_speed_dict
optics_type = port_optics_get(appl_db, value[0], PORT_OPTICS_TYPE)
File "/usr/local/bin/intfutil", line 224, in port_optics_get
if is_rj45_port(intf_name):
File "/usr/local/lib/python3.9/dist-packages/utilities_common/platform_sfputil_helper.py", line 120, in is_rj45_port
platform_chassis = sonic_platform.platform.Platform().get_chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 37, in __init__
self._is_host = self._api_common.is_host()
TypeError: is_host() takes 0 positional arguments but 1 was given
Work item tracking
Microsoft ADO (number only): 27208152
How I did it
Add self parameter to function Common::is_host().
How to verify it
Verified on E1031 DUT with this patch.
#### Why I did it
src/sonic-linux-kernel
```
* d85a5ad - (HEAD -> 202305, origin/202305) Disable small sector erase size for UBIFS on flash (#382) (16 hours ago) [Mridul Bajpai]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Added YANG related changes for adding `dom_polling` field in PORT table of CONFIG_DB. This field can be set with `config interface transceiver dom PORT_NAME (enable|disable)` CLI.
The `dom_polling` field was added through https://github.com/sonic-net/sonic-utilities/pull/3187. Please refer to this PR for the details on the reason for adding `dom_polling` field.
Added `dom_polling` field to CONFIG_DB PORT table.
Added unit tests for both valid and invalid options for controlling `dom_polling`.
Valid values for for `dom_polling` are `enabled` and `disabled`
Any other value is treated as an invalid value
Why I did it
Release notes for Cisco 8102-64H-O, 8101-32FH-O, and 8111-32EH-O
• Fix to address Netscan packet drop issue
• Fix for show platform npu router route-table CLI
• Addressed testcase errors for test_drop_counter syslog
• Addressed testcase failures for test_thermal.TestThermalApi for 8111
• Double commit of fix for ECC errors for 202305
How I did it
Update platform version to 202305.1.0.9
#### Why I did it
src/sonic-utilities
```
* 55800354 - (HEAD -> 202305, origin/202305) [show] Update show run all to cover all asic config in multiasic (#3148) (22 hours ago) [jingwenxie]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
src/sonic-utilities
* 45c44094 - (HEAD -> 202305, origin/202305) [config] Add Table hard dependency check (#3159) (10 hours ago) [jingwenxie]
* 8bde9cf7 - Skip the validation of action in acl-loader if capability table in STATE_DB is empty (#3199) (20 hours ago) [bingwang-ms]
How I did it
How to verify it
#### Why I did it
src/sonic-gnmi
```
* 71ae185 - (HEAD -> 202305, origin/202305) Merge pull request #202 from sonic-net/revert-191-cherry/202305/190 (82 minutes ago) [StormLiangMS]
|\
| failure_prs.log e2b78b4 - (origin/revert-191-cherry/202305/190) Revert "Fix sonic string in osversion/build (#190)" (2 hours ago) [StormLiangMS]
|/
* b431273 - Account for GLOBAL key in PFC_WD (#178) (6 days ago) [Zain Budhwani]
* e1af33f - Merge pull request #199 from abdosi/202305 (6 days ago) [StormLiangMS]
|\
| failure_prs.log f7e216b - Update db_client.go (6 days ago) [abdosi]
|/
* ec3f56d - Fix sonic string in osversion/build (#190) (9 days ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* c0ba32ad - (HEAD -> 202305, origin/202305) CLI to skip polling for periodic information for a port in DomInfoUpdateTask thread (#3187) (16 hours ago) [mihirpat1]
* 261cfdf7 - CLI enhancements to revtrieve data from TRANSCEIVER_FIRMWARE_INFO table (#3177) (#3189) (19 hours ago) [mssonicbld]
* 6160ee79 - [202305][config] Add YANG alerting for override (#3195) (20 hours ago) [jingwenxie]
* a55624d8 - [fast/warm-reboot] Put ERR message in syslog when a failure is seen (#3186) (34 hours ago) [Vaibhav Hemant Dixit]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.
How I did it
Setting loglevel attribute in crashkernel cmdline
How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
#### Why I did it
src/sonic-swss
```
* 1d651e8b - (HEAD -> 202305, origin/202305) Fix the Orchagent crash seen during Port channel OC test cases. (#3042) (2 days ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* e2ac2b63 - (HEAD -> 202305, origin/202305) [Bug] Fix fw_setenv illegel character issue (#3201) (2 days ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fix the build broken issue:
Processing /sonic_host_services-1.0-py3-none-any.whl
Requirement already satisfied: dbus-python in /usr/lib/python3/dist-packages (from sonic-host-services==1.0) (1.2.16)
Requirement already satisfied: systemd-python in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (235)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/lib/python3.9/dist-packages (from sonic-host-services==1.0) (3.1.2)
Collecting PyGObject (from sonic-host-services==1.0)
Downloading pygobject-3.48.0.tar.gz (714 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 714.2/714.2 kB 13.1 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'error'
error: subprocess-exited-with-error
Work item tracking
Microsoft ADO (number only): 27124786
How I did it
Install the pygobject before installing the sonic_host_services.
If installing during the .,whl, it will try to install the latest version (3.48.0), then it will have an issue. Prefer to use the version 3.46.0, see
sonic-buildimage/files/build/versions/host-image/versions-py3
Line 55 in a6437d8
pygobject==3.46.0
It will not add a new package, only install the depended packages firstly.
* [Security] Fix the krb5 vulnerability issue (#17914)
### Why I did it
Fix the krb5 vulnerable issue
CVE-2021-36222 allows remote attackers to cause a NULL pointer dereference and daemon crash
CVE-2021-37750 NULL pointer dereference in kdc/do_tgs_req.c via a FAST inner body that lacks a server field
DSA 5286-1 remote code execution
##### Work item tracking
- Microsoft ADO **(number only)**: 26577929
#### How I did it
Upgrade the krb5 version to 1.18.3-6+deb11u14+fips.
* [Build] Fix krb5 package not found issue (#17926)
Why I did it
Fix the build issue caused by the wrong version specified.
See the build error logs:
Try 4: /usr/bin/wget --retry-connrefused failed to get: -O
--2024-01-26 11:38:23-- https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.10/amd64/libk5crypto3_1.18.3-6+deb11u14+fips_amd64.deb
Resolving sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)... 20.60.59.131
Connecting to sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)|20.60.59.131|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-01-26 11:38:23 ERROR 404: The specified blob does not exist..
Try 5: /usr/bin/wget --retry-connrefused failed to get: -O
make[1]: *** [Makefile:12: /sonic/target/debs/bullseye/symcrypt-openssl_0.10_amd64.deb] Error 8
make[1]: Leaving directory '/sonic/src/sonic-fips'
Work item tracking
Microsoft ADO (number only): 26577929
The package not installed but PR passed issue is traced in another issue #17927
How I did it
Add the libkrb5-dev and the depended packages to fix docker-sonic-vs build failure.
The package libzmq3-dev has dependency on the libkrb5-dev.
* [202305] Support FIPS for armhf
* Remove no use mirror
* Fix fips options issue
Why I did it
FRR issue
FRRouting/frr#15459
Add patch for FRR PR for 202305
FRRouting/frr#15466
Work item tracking
Microsoft ADO (number only):
26876083
How I did it
Add patch for FRR PR
FRRouting/frr#15466
How to verify it
Run the original case locally
#### Why I did it
src/sonic-platform-daemons
```
* 72e9a28 - (HEAD -> 202305, origin/202305) Enable periodic polling of TRANSCEIVER_FIRMWARE_INFO table in DomInfoUpdateTask (#443) (#445) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 49af6dfb - (HEAD -> 202305, origin/202305) Several fixes and updates for ecnconfig (#2889) (22 hours ago) [rbpittman]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Reverting DHCP counter changes due to unexpected packet drops seen in recv buffer, causing counter counts to be inaccurate in dhcpmon and affecting dhcp6relay performance
Work item tracking
Microsoft ADO (number only): 26918588
How I did it
Reset submodule head and revert related dockerfile changes
How to verify it
Ran mgmt test and stress test
Why I did it
deb11u1 is deprecated.
Use deb11u2 instead.
Other branches are not impacted, because their reproducible build version files are up to date.
Work item tracking
Microsoft ADO (number only): 26964185
How I did it
How to verify it
#### Why I did it
src/sonic-platform-common
```
* 29544ed - (HEAD -> 202305, origin/202305) Certain VDM fields not populating after encountering KeyError on 400ZR optics (#442) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* e16fed53 - (HEAD -> 202305, origin/202305) Modify transceiver PM CLI to handle N/A value for DOM threshold (#3174) (10 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
### Why I did it
Fix flakiness of eventd UT - run sub after capture service starts
##### Work item tracking
- Microsoft ADO **(number only)**:25650744
#### How I did it
Run sub socket after capture socket is initialized
#### How to verify it
Pipeline
#### Why I did it
src/sonic-swss
```
* 8d6aac03 - (HEAD -> 202305, origin/202305) [intfsorch] Enable ipv6 proxy ndp along with proxy arp (#3045) (2 days ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 514f4329 - (HEAD -> 202305, origin/202305) Fix sfputil CLI failure for multi-asic platforms (#3168) (#3181) (3 hours ago) [longhuan-cisco]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fixed DhcpV6 CoPP issue. In certain scenarios dhcpv6 packet ff02::1:2,udp=17,l4-dst-port=547 was not trapped to CPU.
Fixed test_copp.py::test_add_new_trap and test_remove_trap PTF failures
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Why I did it
ICM reported due to "BGPMon Process exited" which was caused by json load exception.
Work item tracking
Microsoft ADO (number only):
25916773
How I did it
Add an exception handle during json load.
How to verify it
Verified locally, add debug log to modify the output string of cmd to make it not with json formation, then check the syslog.
#### Why I did it
src/sonic-swss
```
* 5d91f105 - (HEAD -> 202305, origin/202305) Allow L4 port range egress ACL rules on DNX (#3014) (2 days ago) [arista-nwolfe]
```
#### How I did it
#### How to verify it
#### Description for the changelog
These changes adjust Nokia IXR7250 thermal sensor logging thresholds.
Why I did it
To modify the thermal sensor logging thresholds used on LC and Supervisor.
How I did it
Modified the JSON based thermal logging thresholds used to determine when to log current high sensor temperature and hottest sensor margin fluctuations.
How to verify it
Verify that syslog messages indicating current (high) temperature and margin values are only logged when these respective values fluctuate by at least 5 degrees.
Co-authored-by: snider-nokia <76123698+snider-nokia@users.noreply.github.com>
Why I did it
Disable Bad Link Detection feature in SDK
Fix to address pmbus driver errors causing “show platform psu status” not showing power
Work item tracking
How I did it
How to verify it
#### Why I did it
src/sonic-sairedis
```
* 6a018ae - (HEAD -> 202305, origin/202305) Install nlohmann-json3-dev for CodeQL analysis of (#1350) (3 days ago) [JunhongMao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 1c5c134b - (HEAD -> 202305, origin/202305) Add all SKUs to the generic config update list (#3131) (4 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-gnmi
```
* 50849ce - (HEAD -> 202305, origin/202305) Replace PFC_WD_TABLE with PFC_WD (#173) (22 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* b89740b - (HEAD -> 202305, origin/202305) Updated SAI module for 202305 branch to latest v1.12 SAI (#1356) (2 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 6616cb5 - (HEAD -> 202305, origin/202305) Skip FABRIC PORT Attributes from sairedis logging (#1339) (2 days ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* b6f8a8d - (HEAD -> 202305, origin/202305) Fix memory map parsing issue (#427) (22 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
ix IPV6 forced-mgmt-route not work issue
Why I did it
IPV6 forced-mgmt-route not work
When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.
Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue.
Microsoft ADO (number only):24719238
#### Why I did it
src/sonic-utilities
```
* c5f53423 - (HEAD -> 202305, origin/202305) Fix `sudo config load_mgmt_config` fails with error "File /var/run/dhclient.eth0.pid does not exist" (#3149) (16 hours ago) [Mai Bui]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* a64276a - (HEAD -> 202305, origin/202305) Tx/Rx power values should be rounded up to 3 decimal places (#432) (22 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
The PR introduced a bug for slim image build, #17905, by which the sonic_asic_platform is missing when build docker image for slim image.
[ building ] [ target/docker-dhcp-relay.gz ]
/sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
/sonic
Traceback (most recent call last):
File "/usr/local/bin/j2", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 202, in main
output = render_command(
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 186, in render_command
result = renderer.render(args.template, context)
File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 85, in render
return self._env \
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "/sonic/dockers/docker-dhcp-relay/Dockerfile.j2", line 48, in top-level template code
{% if build_reduce_image_size != "y" or sonic_asic_platform != "broadcom" %}
jinja2.exceptions.UndefinedError: 'sonic_asic_platform' is undefined
make: *** [slave.mk:1072: target/docker-dhcp-relay.gz] Error 1
make: *** Waiting for unfinished jobs....
[ finished ] [ target/docker-swss-layer-bullseye.gz ]
[ finished ] [ target/docker-syncd-brcm-dnx.gz ]
make[1]: *** [Makefile.work:608: target/sonic-broadcom.bin] Error 2
make[1]: Leaving directory '/data/work/1/s'
make: *** [Makefile:41: target/sonic-broadcom.bin] Error 2
And why it slipped the PR test? PR test doesn't compile with slim option, it won't check sonic_asic_platform != "broadcom" for PR build.
Work item tracking
Microsoft ADO (number only):
How I did it
Export sonic_asic_platform for docker build in slave.mk
How to verify it
build with slim image option.
*use lower case for IPv6 address as internal key and bfd session key. fixes#15764
Why I did it
*staticroutebfd uses the IPv6 address string as a key to create bfd session and cache the bfd sessions using it as a key.
When the IPv6 address string has uppercase letter in the static route nexthop list, the string with uppercase letter key is stored in the cache, but the BFD STATE_DB uses lowercase for IPv6 address, so when the staticroutebfd get the bfd state event, it cannot find the bfd session in its local cache because of the letter case.
Why I did it
Upgrade the xgs SAI version to 8.4.41.1 to include the following fix:
8.4.41.1: Cherry-pick from SAI 4.3: CS00012288297: Fix TX queue for control packets
Work item tracking
Microsoft ADO (number only): 26626208
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
run test_bgp_queue.py test on 7050qx T1: https://dev.azure.com/mssonic/internal/_build/results?buildId=467287&view=results
Why I did it
Advance dhcpmon submodule head
Work item tracking
Microsoft ADO (number only): 26270786
How I did it
fc20a97 Yaqiang Zhu Wed Jan 10 09:11:25 2024 +0800 [202311][counter] Clear counter table when dhcpmon init (#14)
bace2e0 Yaqiang Zhu Fri Jan 5 11:29:21 2024 +0800 [counter] Clear counter table when dhcpmon init (#14)
How to verify it
Disable eventd at buildtime for slim images
- Microsoft ADO **(number only)**:26386286
Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image
Manual testing
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.
Syslog error message examples:
Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:
[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:
https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous
- How I did it
Add a kernel parameter to tell libata to disable NCQ
- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
Why I did it
Fix an error in the log_err call.
this error can be triggered by an invalid static route key. usually the code cannot go here with normal config file. but hit this issue with an invalid key by manual testing with redis-cli directly. the file is scanned by Python lint to prevent such errors.
Work item tracking
Microsoft ADO ():26250268
How I did it
fix the format error.
How to verify it
1, ran pylint to check the design, make sure no such error in the design file.
2, wrote a separate python program to verify the log call.
In the current logging related testing, usually use patch/mock for logging. for this specific error, could not trigger it if we call mock function instead the real function in the design. so need to do lint checking for code change.
Why I did it
Update SDK/FW version to 4.6.2202/2012.2202
Fixed issues
On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place
Work item tracking
Microsoft ADO (number only):
How I did it
Updating make files.
How to verify it
Running regression
Why I did it
Upgrade the xgs SAI version to 8.4.41.0 to include the following fix:
8.4.39.3: Revert "Merged PR 4452: Update SAI version to 8.4.39.2 to include fix capability for Hostif queue"
8.4.40.0: [sbumodule upgrade][CS00012330252] ACL entry programming takes longer in SAI version 8.4 compared to SAI version 7.1
8.4.41.0: [CS00012330251]Extra buffer Profiles created internally are seen as regular profiles in Get calls
Work item tracking
Microsoft ADO (number only): 26609411
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
8.4.40.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=465899&view=results
8.4.41.0: https://dev.azure.com/mssonic/internal/_build/results?buildId=466690&view=results
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
#### Why I did it
When set TACACS to "tacacs+, local", user still can run a blocked command with local permission.
##### Work item tracking
- Microsoft ADO: 26399545
#### How I did it
Fix code to reject command when authorized failed from TACACS server side.
#### How to verify it
Pass all UT.
### Description for the changelog
Fix when set TACACS to "tacacs+, local" user can run blocked command with local permission issue.
#### Why I did it
src/sonic-utilities
```
* c5e30e38 - (HEAD -> 202305, origin/202305) [202305] Enhanced route_check.py for multi_asic platforms (#3112) (21 hours ago) [Deepak Singhal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
What I did:
Added support when TSA is done on Line Card make sure it's completely
isolated from all e-BGP peer devices from this LC or remote LC
Why I did:
Currently when TSA is executed on LC routes are withdrawn from it's connected e-BGP peers only. e-BGP peers on remote LC can/will (via i-BGP) still have route pointing/attracting traffic towards this isolated LC.
How I did:
When TSA is applied on LC all the routes that are advertised via i-BGP are set with community tag of no-export so that when remote LC received these routes it does not send over to it's connected e-BGP peers.
Also once we receive the route with no-export over iBGP match on it and and set the local preference of that route to lower value (80) so that we remove that route from the forwarding database. Below scenario explains why we do this:
- LC1 advertise R1 to LC3
- LC2 advertise R1 to LC3
- On LC3 we have multi-path/ECMP over both LC1 and LC2
- On LC3 R1 received from LC1 is consider best route over R1 over received from LC2 and is send to LC3 e-BGP peers
- Now we do TSA on LC2
- LC3 will receive R1 from LC2 with community no-export and from LC1 same as earlier (no change)
- LC3 will still get traffic for R1 since it is still advertised to e-BGP peers (since R1 from LC1 is best route)
- LC3 will forward to both LC1 and LC2 (ecmp) and this causes issue as LC2 is in TSA mode and should not receive traffic
To fix above scenario we change the preference to lower value of R1 received from LC2 so that it is removed from Multi-path/ECMP group.
How I verfiy:
UT has been added to make sure Template generation is correct
Manual Verification of the functionality
sonic-mgmt test case will be updated accordingly.
Please note this PR is on top of this :#16714 which needs to be merged first.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:
Jan 8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist
Jan 8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist
- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
Why I did it
Align the keywords to make qos configuration take effect
Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI
How to verify it
reload minigraph and check the qos configuration
#### Why I did it
src/sonic-platform-daemons
```
* 824c20a - (HEAD -> 202305, origin/202305) Support 800G ifname in xcvrd (#420) (20 hours ago) [Anoop Kamath]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 83a548de - (HEAD -> 202305, origin/202305) Disable Key Validation feature during sonic-installation for Cisco Platforms (#3115) (22 hours ago) [selvipal]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* 6f59d29 - (HEAD -> 202305, origin/202305) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (#309) (33 minutes ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* 2efaf2e - (HEAD -> 202305, origin/202305) Revert "[action] [PR:303] Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303)" (#308) (4 minutes ago) [StormLiangMS]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Currently, whenever isc-dhcp-relay forwards a packet upstream,
internally, it will try to send it on a "fallback" interface. My
understanding is that this isn't meant to be a real interface, but
instead is basically saying to use Linux's regular routing stack to
route the packet appropriately (rather than having isc-dhcp-relay
specify specifically which interface to use).
The problem is that on systems with a weak CPU, a large number of
interfaces, and many upstream servers specified, this can introduce a
noticeable delay in packets getting sent. The delay comes from trying to
get the ifindex of the fallback interface. In one test case, it got to
the point that only 2 packets could be processed per second. Because of
this, dhcrelay will easily get backlogged and likely get to a point
where packets get dropped in the kernel.
Fix this by adding a check saying if we're using the fallback interface,
then don't try to get the ifindex of this interface. We're never going
to have an interface named this in SONiC.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-snmpagent
```
* b0a4bcc - (HEAD -> 202305, origin/202305) Set the execute bit on sysDescr_pass.py (#306) (22 hours ago) [Andre Kostur]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* f5e9b54 - (HEAD -> 202305, origin/202305) [CodeQL] fix unmet build dependency (#222) (10 hours ago) [Jing Zhang]
* 2282cc5 - [active-standby] Probe the link in suspend timeout (#235) (22 hours ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 93c42272 - (HEAD -> 202305, origin/202305) [chassis]: Support show ip bgp summary to display without error when no external neighbors are configured on chassis LC (#3099) (22 hours ago) [Arvindsrinivasan Lakshmi Narasimhan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Upgrade the xgs SAI version to 8.4.39.2 to include the following fix:
8.4.36.0: [submodule upgrade] [SAI_BRANCH rel_ocp_sai_8_4] SID: SDK-381039 Cosq control dynamic type changes
8.4.37.0: SID: MMU cosq control configuration with Dynamic Type Check
8.4.38.0: [sbumodule upgrade] [CSP 0001232212][SAI_BRANCH rel_ocp_sai_8_4]back-porting SONIC-82415 to SAI 8.4
8.4.39.0: [CSP CS00012320979] Port SONIC-81867 sai spec compliance for get SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO
8.4.39.1: changes for phy-re-init of 40G ports for TH platforms CS00012327470
8.4.39.2: fix capability for Hostif queue by change SET operation of SAI_HOSTIF_ATTR_QUEUE to be true
Work item tracking
Microsoft ADO (number only): 26491005
How I did it
Upgrade xgs SAI version in sai.mk file.
How to verify it
Run basic SONiC test using SAI release pipeline, all cases passed.
https://dev.azure.com/mssonic/internal/_build/results?buildId=457869&view=results
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's
How I did:
- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.
How I verify:
Manual Verification
UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
double commit PR-16450 because of cherry pick conflict for PR#202305
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Why I did it
When we disable telemetry.service, sonic-hostservice will not start. And root cause is sonic-hostservice is only wanted by telemetry.service.
Work item tracking
Microsoft ADO (number only):
How I did it
Add dependency for gnmi.service.
How to verify it
Disable telemetry.service and build new image, and then check sonic-hostservice with new image.
Fix#16204
Microsoft ADO (number only): 25746782
How I did it
multiarch/debian-debootstrap:arm64-bullseye is too old.
It needs to add some gpg keys before 'apt-get update'
#### Why I did it
src/sonic-swss
```
* ac94f0b7 - (HEAD -> 202305, origin/202305) [202305][routeorch] Fixing bug with multiple routes pointing to nhg (#3002) (2 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
*What I did:
Enable BFD for Static Route for chassis-packet. This will trigger the use of the feature as defined in here: #13789
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
What I did:
Added flag in sonic_version.yml to see if compiled image is secured or non-secured. This is done using build/compile time environmental variable SECURE_UPGRADE_MODE as define in HLD: https://github.com/sonic-net/SONiC/blob/master/doc/secure_boot/hld_secure_boot.md
This flag does not provide the runtime status of whether the image has booted securely or not. It's possible that compile time signed image (secured image) can boot on non secure platform.
Why I did:
Flag can be used for manual check or by the test case.
ADO: 24319390
How I verify:
Manual Verification
---
build_version: 'master-16191.346262-cdc5e72a3'
debian_version: '11.7'
kernel_version: '5.10.0-18-2-amd64'
asic_type: broadcom
asic_subtype: 'broadcom'
commit_id: 'cdc5e72a3'
branch: 'master-16191'
release: 'none'
build_date: Fri Aug 25 03:15:45 UTC 2023
build_number: 346262
built_by: AzDevOps@vmss-soni001UR5
libswsscommon: 1.0.0
sonic_utilities: 1.2
sonic_os_version: 11
secure_boot_image: 'no'
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.
Why I did it
To address thermal logging deficiencies.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:
Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.
Modify the module.py reboot to support reboot linecard from Supervisor
- Modify reboot to call _reboot_imm for single IMM card reboot
- Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output
- Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
- Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
- Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
- Provide ability for IMM to disable all transceiver module TX at reboot time
- Remove defunct xcvr-resync service
Why I did it
When Supervisor card is rebooted by using PMON API, it takes about 90 seconds to trigger the shutdown in down path. At this time linecards have been up. This delays linecards database initialization which is trying to PING/PONG the database-chassis. To address this issue, we modified the NDK to use the system call with "sudo reboot" when the request is from PMON API on Supervisor case. The NDK version is 22.9.20 and greater. This new NDK requires this modifcaiton of platform_reboot to work with.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
Modify the platform_reboot In Supervisor not to reboot all IMMs since it has been done in the function reboot() in module.py. Also handle the reboot-cause.txt for on the Supervisor when the reboot is request from PMON API.
Modify the Nokia platform specific platform_reboot in linecard to disable all SPFs.
This PR works with NDK version 22.9.20 and above
Signed-off-by: mlok <marty.lok@nokia.com>
For 40G optics there is SAI handling of T0 facing ports to be set with SR4 type and unreliable los set for a fixed set of ports. For this property to be invoked the requirement is set
phy_unlos_msft=1 in config.bcm.
This change is to meet the requirement and once this property is set, the los/interface type settings is applied by SAI on the required ports.
Why I did it
For Arista-7060CX-32S-Q32 T1, 40G ports RX_ERR minimalization during connected device reboot
can be achieved by turning on Unreliable LOS and SR4 media_type for all ports which are connected to T0.
The property phy_unlos_msft=1 is to exclusively enable this property.
Microsoft ADO: 25941176
How I did it
Changes in SAI and turning on property
How to verify it
Ran the changes on a testbed and verified configurations are as intended.
with property
admin@sonic2:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on = 0
Media Type = 2
Unreliable LOS = 1
Scrambling Disable = 0
Lane Config from PCS = 0
without property
admin@sonic:~$ bcmcmd "phy diag xe8 dsc config" | grep -C 2 "LOS"
Brdfe_on = 0
Media Type = 0
Unreliable LOS = 0
Scrambling Disable = 0
Lane Config from PCS = 0
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Why I did it
For some devices with small memory, after upgrading to the latest image, the available memory is not enough.
Work item tracking
Microsoft ADO (number only):
26324242
How I did it
Disable restapi feature for LeafRouter which with slim image.
How to verify it
verified on 7050qx T1 (slim image), restapi disabled
verified on 7050qx T0 (slim image), restapi enabled
verified on 7260 T1 (normal image), restapi enabled
#### Why I did it
src/sonic-utilities
```
* 651a80b1 - (HEAD -> 202305, origin/202305) Modify teamd retry count script to base BGP status on default BGP status (#3069) (22 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com
This improvement does not bring any warm-reboot degradation, since the database availability (tcp/ip access over the loopback interface) was fixed by these PRs:
Re-add 127.0.0.1/8 when bringing down the interfaces #15080
Fix potentially not having any loopback address on lo interface #16490
Why I did it
Removed dependency on interfaces-config.service to speed up the boot, because interfaces-config.service takes a lot of time on init
Work item tracking
N/A
How I did it
Changed service files for swss/syncd
How to verify it
Boot and check swss/syncd start time comparing to interfaces-config
#### Why I did it
src/linkmgrd
```
* 2f5971f - (HEAD -> 202305, origin/202305) [warmboot] use config_db connector to update mux mode config instead of CLI (#223) (4 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com
Why I did it
Improved switch init time
Work item tracking
N/A
How I did it
Replaced: sonic-cfggen -> sonic-db-cli
Aggregated template list for sonic-cfggen
How to verify it
Run warm-reboot
#### Why I did it
src/linkmgrd
```
* 2089ab6 - (HEAD -> 202305, origin/202305) Exclude DbInterface in PR coverage check (#224) (3 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-snmpagent
```
* e5fd192 - (HEAD -> 202305, origin/202305) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (9 hours ago) [DavidZagury]
```
#### How I did it
#### How to verify it
#### Description for the changelog
DEPENDS ON: sonic-net/sonic-swss#2997sonic-net/sonic-utilities#3093
What I did
Revert the feature.
Why I did it
Revert bgp suppress FIB functionality due to found FRR memory consumption issues and bugs.
How I verified it
Basic sanity check on t1-lag, regression in progress.
Backport PR #17458 due to conflict.
Why I did it
Optimize syslog rate limit feature for fast and warm boot
Work item tracking
Microsoft ADO (number only):
How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
How to verify it
Manual test
Regression test
* [Celestica-E1031] Enable CPU watchdog (#16083)
Enable CPU watchdog on Celestica-E1031.
* Add info syslog for cpu_wdt.service (#16678)
Why I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
How I did it
Add info syslog for cpu_wdt.service when trigger watchdog arm action.
Why I did it
Release notes for Cisco 8111-32EH-O, 8102-64H-O and 8101-32FH-O:
• Fixed a bug in PFC-WD where watchdog is triggered too often when sparse traffic is present, failing to detect the traffic traversal - (SR 696617830)
• Resolved an issue where SAI_STATUS_ITEM_NOT_FOUND error was seen while adding LAG members - (MIGSMSFT-354)
• Fixed Thermal API related error message (MIGSMSFT-354)
• Fixed an issue related to default config trap - (MIGSMSFT-354)
• Changed the message log level from error to debug in situations when the HW offloaded session is not found or was never created for the packet received. (MIGSMSFT-354)
• Fixed an issue where drop option was not working when encap and decap IPinIP tunnels share the same SDK tunnel port.
• Fixed an error while running VRF testcase (MIGSMSFT-354)
• Fixed an issue where BFD packets not egressing using Queue 7
• SAI support for additional FEC related attributes:
· SAI_PORT_ATTR_MAX_FEC_SYMBOL_ERRORS_DETECTABLE
· SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S0
. SAI_PORT_STAT_IF_IN_FEC_CODEWORD_ERRORS_S16
Work item tracking
Microsoft ADO (number only):
#### Why I did it
src/sonic-swss
```
* 5643db9a - (HEAD -> 202305, origin/202305) [muxorch] Fixing cache bug in updateRoute logic (#2982) (6 hours ago) [Nikola Dancejic]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':
Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.392544 sonic ERR pmon#xcvrd: Traceback (most recent call last):
Nov 27 09:47:16.392643 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1518, in run
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: self.task_worker()
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1240, in task_worker
Nov 27 09:47:16.392757 sonic ERR pmon#xcvrd: sfp = platform_chassis.get_sfp(pport)
Nov 27 09:47:16.392793 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 346, in get_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self.initialize_single_sfp(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 288, in initialize_single_sfp
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: self._sfp_list[index] = sfp_module.SFP(index)
Nov 27 09:47:16.392830 sonic ERR pmon#xcvrd: File "/usr/local/lib/python3.9/dist-packages/sonic_platform/sfp.py", line 272, in __init__
Nov 27 09:47:16.392866 sonic ERR pmon#xcvrd: from .thermal import initialize_sfp_thermal
Nov 27 09:47:16.392918 sonic ERR pmon#xcvrd: ImportError: cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Xcvrd: exception found at child thread CmisManagerTask due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
Nov 27 09:47:16.393103 sonic ERR pmon#xcvrd: Exiting main loop as child thread raised exception!
Work item tracking
Microsoft ADO (number only):
How I did it
Add lock for creating SFP object
How to verify it
UNIT TEST
Manual Test
Signed-off-by: anamehra anamehra@cisco.com
Why I did it
Fixes#16990 for 202305/202205 branch
Note: This PR is for 202305 and 202205. For master, a new PR will be raised with a new field (Uphold=) provided by debian bookworm to handle the dependency failure restartability of the processes.
determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.
The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history
deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.
This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.
Work item tracking
Microsoft ADO 25892856
How I did it
Modified systemd unit files to make determine-reboot-cause and process-reboot-cause services restartable when the database service restarts.
On the restart, the determine-reboot-cause service should not recreate a new reboot-cause entry in the database. Added check for first start or restart to skip entry for restart case.
How to verify it
On single asic pizza box:
Installed the image and check reboot-cause history
restart database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.
On Chassis:
Installed the image and check reboot-cause history
restart the database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.
Reboot LC. On Supervicor, stop database-chassis service.
Let database service on LC fail the first time. determine-reboot-cause and process-reboot-cause would fail to start due to dependency failure
start database-chassis on Supervisor. Database service on LC should now start successfully.
Verify determine-reboot-cause and process-reboot-cause also starts
Verify show reboot-cause history output
Why I did it
To fix ecmp hash polarization issue.
Work item tracking
Microsoft ADO (number only): 26085143
How I did it
Add sai_hash_seed_config_hash_offset_enable=1 in all config.bcm that Broadcom T1 uses.
HardwareSku
Force10-S6100-T1
Force10-S6100-ITPAC-T1
Force10-S6100
Celestica-DX010-C32
Arista-7260CX3-C64
Arista-7060CX-32S-Q32
Arista-7060CX-32S-C32-T1
Arista-7060CX-32S-C32
Arista-7050QX32S-Q32
Arista-7050QX-32S-S4Q31
Arista-7050-QX32
Arista-7050-QX-32SInclude Broadcom's fix by upgrading xgs SAI version to 8.4.35.0.
8.4.35.0: [CSP 00012324019] back-porting SONIC-75006 to SAI8.4
8.4.34.0:
[CSP 00012318293] back-porting SONIC-81534 to SAI8.4;
ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
Run qual with only xgs SAI version upgraded to 8.4.35.0:
on TH2: https://elastictest.org/scheduler/testplan/6579b36ccfacd86e78e3e885?leftSideViewMode=detail&prop=status&order=ascending
on TH: https://elastictest.org/scheduler/testplan/657a75f8c1d3b51fc1d585b4?leftSideViewMode=detail&prop=status&order=ascending
How to verify it
use tests/ecmp/test_ecmp_sai_value.py to verify.
Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983
While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
Why I did it
Update SAI version to SAIBuild2305.26.0.16
Update SDK/FW to 4.6.2134/2012.2134
Fixed issues:
Updated SN3700C to enable limit to 100G speed.
Recovering from Low power mode might ends with port down.
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in makefile
How to verify it
Confirm issues fixed and run sonic-mgmt tests
202305 image does not come up on chassis with SAI 7.1.111.1.
SAI 9.2.0.0 on 202305 image is verified to come up on Arista chassis. Initial testing is also done, no new failures compare to 202205 image, SAI 7.1.111.1.
Why I did it
Bring up 202305 image on chassis.
Work item tracking
Microsoft ADO (number only): 18189434
How I did it
How to verify it
Brought up SAI 9.2.0.0 on Arista chassis.
Ran pipeline on acl, bgp, arp, acms, cacl, copp, decap, fib, iface_namingmode.
#### Why I did it
src/sonic-platform-common
```
* 57f63e6 - (HEAD -> 202305, origin/202305) Adding supported vendor PNs for remote CDB FW upgrade (#418) (4 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 7cf32a9f - (HEAD -> 202305, origin/202305) Reduce generate_dump mem usage for cores (#3052) (16 hours ago) [davidm-arista]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.
Why I did it
To overcome SN2700 20 sec delay on login
Work item tracking
N/A
How I did it
Removed MFT bash autocompletion part
How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
#### Why I did it
src/sonic-platform-daemons
```
* f23e342 - (HEAD -> 202305, origin/202305) Add dynamic sensor logic for fixed and psu presence/state checking in thermalctld (#401) (18 hours ago) [Gregory Boudreau]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fixed the issue - Some special IPv6 packets cannot be dropped by dataplane ACL rule
Work item tracking
Microsoft ADO (number only):
No
How I did it
How to verify it
Loaded SAI debian (in syncd docker) and re-run the failed cases.
#### Why I did it
src/sonic-dbsyncd
```
* 68baf40 - (HEAD -> 202305, origin/202305) [lldp-syncd] Fix unexpected exception in snmp-subagent (#64) (18 hours ago) [Zhaohui Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* cebac831 - (HEAD -> 202305, origin/202305) [ci] Use correct bullseye docker image according to source branch. (17 hours ago) [Liu Shilong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
Work item tracking
Microsoft ADO (number only):
How I did it
Update Cisco platform to 202305.1.0.3
How to verify it
#### Why I did it
src/sonic-swss
```
* 04fab921 - (HEAD -> 202305, origin/202305) [coppmgrd] Fix Copp processing logic by using Producer del instead of del from Table (14 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Update SAI to SAIBuild2305.26.0.9 for Mellanox platforms.
Fixed issues:
When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
Work item tracking
Microsoft ADO (number only):
How I did it
Updated SAI version in "mlnx-sai.mk" Makefile.
How to verify it
Run "sonic-mgmt" regression testing.
#### Why I did it
src/sonic-linux-kernel
```
* 35f39af - (HEAD -> 202305, origin/202305) [202305] [kconfig] Set default SATA Link Power Management policy (#365) (9 hours ago) [Volodymyr Samotiy]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* a07a03b - (HEAD -> 202305, origin/202305) Fix issue: QSFP module with id 0x0d can be parsed using 8636 (#412) (79 minutes ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Release Notes for Cisco 8111-32EH-O and 8102-64H
Fix for "Failed to get port by bridge port ID" error (MIGSMSFT-354)
Added CLI to enable trap events (MIGSMSFT-166)
Support to add critical message upon replace device SAI notification
Added support for input voltage/current/power info for PSUs
Added support for sff_mgr for deterministic bringup of SFF compliant modules
IOFPGA fix to support optics port in low power mode on 8101-32FH-O
Enable CMIS Manager for 8111-32EH-O
Added dump option to “show plat npu mac-state” CLI to dump MAC state info
Added media-based NPU serdes attributes for Credo 800G AEC Y-cables from media_settings.json
Auto FPD support for power CPLD on 8101 and 8111 platforms
Caveats:
Validation on 8101-32FH-O still pending. Will update release notes once completed.
Below 8800 platform specific fixes included but 8800 support not claimed in this code drop
Interop fix for BFD and Fair VOQ
Fix to update voq cgm profile during port speed change event
Create ECN profiles based on port speeds dynamically
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
### Why I did it
The json.hpp header file from that package is used in the sonic-swss-common build. An old version of that header file (from 2016) has been checked into the sonic-swss-common repo. However, since then, there have been changes to that header file, and starting with GCC 12 in Bookworm, generates some errors about variables being possibly uninitialized before use.
##### Work item tracking
- Microsoft ADO **(number only)**: 25027439
#### How I did it
To fix this, install the nlohmann-json3-dev package, and allow using the header file from the Debian package instead of a static checked-in version. The version in Debian Bullseye is much newer than this version.
#### How to verify it
With this change alone, sonic-swss-common will still be using the json.hpp file in its own codebase. The change to actually use the system header file instead of the local header file will happen in a separate PR in the necessary repoes.
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Need to share docker image for telemetry and gnmi, and only use telemetry container for 202305 branch
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new docker image, base-gnmi, build sonic-gnmi and sonic-telemetry on this docker image.
Enable telemetry container.
How to verify it
Run end to end test for telemetry and gnmi.
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios
This is an extension to the change in image_config: copp: Enable rate limiting
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199
Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.
Microsoft ADO 25776614:
Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
* [chassis/multi-asic] Make sure iBGP session established as directly connected (#16777)
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.
Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example
- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.
Above scenario can result in packet mis-forwarding on data plane
How I fixed it:-
To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature
neighbor PEER ttl-security hops NUMBER
This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.
We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.
How I verify:
Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Update peer-group.conf.j2
* Update result_all.conf
* Update result_base.conf
---------
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-utilities
```
* 2b6b6580 - (HEAD -> 202305, origin/202305) Added support to display only nonzero queue counter. (#2978) (#3046) (15 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-host-services
```
* 689395a - (HEAD -> 202305, origin/202305) Updated the iptable rule to use parent/base name of midplane interface of chassis. (#75) (2 days ago) [abdosi]
* 45212a8 - [DualToR][caclmgrd] Fix IPtables rules for multiple vlan interfaces for DualToR config (#82) (2 days ago) [vdahiya12]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* 6ff3cc2 - (HEAD -> 202305, origin/202305) arm64: Kconfig inclusions to fix PCI hang and MTD detection (#362) (2 days ago) [Pavan Naregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
What I did:
Enable Sending BGP Community over internal neighbors over iBGP Session
Microsoft ADO: 25268695
Why I did:
Without this change BGP community send by e-BGP Peers are not carry-forward to other e-BGP peers.
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52141
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 16:08:26 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52688
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 15:45:51 2023
After the change
str2-xxxx-lc2-2(config)# router bgp 65100
str2-xxxx-lc2-2(config-router)# address-family ipv4
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V4 send-community
str2-xxxx-lc2-2(config-router-af)# exit
str2-xxxx-lc2-2(config-router)# address-family ipv6
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V6 send-community
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52400
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:19 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52947
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:09 2023
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-linux-kernel
```
* f086121 - (HEAD -> 202305, origin/202305) Intgerate HW-MGMT 7.0030.2008 Changes (#361) (12 hours ago) [Vivek]
* 7551dd9 - arm64: Enable CONFIG_KEXEC_FILE (#360) (13 hours ago) [Pavan Naregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Update SAI version to SAIBuild2305.26.0.0
New features
FDB entries are now restored after warmboot to prevent temporary system flooding.
Update SDK/FW to 4.6.2102/2012.2102
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in make file.
How to verify it
Running sonic-mgmt regression.
cherry pick #16894
Why I did it
Privileges and volumes were incorrectly set in macsec container. Privileged flag is set to false and volumes are not mounted properly.
admin@vlab-01:~$ docker inspect macsec0 | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec0 | grep -A 10 Binds
"Binds": [
"/var/run/redis0:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0/Nokia-IXR7250E-36x100G/0:/usr/share/sonic/hwsku:ro",
"/var/run/redis0/:/var/run/redis0/:rw",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0:/usr/share/sonic/platform:ro"
],
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Make sure privileged settings remain unchanged and make sure volumes are properly mounted
admin@vlab-01:~$ docker inspect macsec | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec | grep -A 10 Binds
"Binds": [
"/etc/timezone:/etc/timezone:ro",
"/var/run/redis:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/etc/fips/fips_enable:/etc/fips/fips_enable:ro",
"/usr/share/sonic/templates/rsyslog-container.conf.j2:/usr/share/sonic/templates/rsyslog-container.conf.j2:ro",
"/etc/sonic:/etc/sonic:ro",
"/host/warmboot:/var/warmboot",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/:/usr/share/sonic/hwsku:ro",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0:/usr/share/sonic/platform:ro"
],
* Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Update SAI deb to 1.12.0-3
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
---------
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.
How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces
How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Orchagent uses PORTCHANNEL term when parsing this field. Change the YANG model to align to orchagent.
- Why I did it
When specifying PORTCHANNEL in ACL_TABLE_TYPE table YAGN model validation does not pass, when using term LAG orchagent does not accept such table type.
Fix it by aligning YANG model to orchagent.
- How I did it
Fix in YANG model.
- How to verify it
Create custom ACL table type.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it
src/sonic-utilities
```
* 3609e417 - (HEAD -> 202305, origin/202305) [sonic-package-manager] do not modify config_db.json (#3032) (2 hours ago) [Stepan Blyshchak]
* 354dfe80 - [sonic_installer]: Improve exception handling: introduce notes. (#3028) (3 hours ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Share docker image to support gnmi container and telemetry container
backport #16863
Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.
How to verify it
Run end to end test.
#### Why I did it
src/sonic-swss
```
* 65720c1a - (HEAD -> 202305, origin/202305) Send hearbeat during warm reboot freese (#2923) (#2956) (14 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 9b9ac4fd - (HEAD -> 202305, origin/202305) Add more debug information when PFC WD is triggered (#2858) (8 minutes ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
To avoid orchagent crash issue like sonic-net/sonic-swss#2935, disable unsupported counters on SONiC management devices.
Work item tracking
Microsoft ADO (number only): 25437720
How I did it
Update the minigraph parser to disable unsupported counters on management devices.
How to verify it
Verified by unittest.
Manually apply patch to DUT and do config load_minigraph
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
Co-authored-by: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com>
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Write error message to syslog when add user failed or connect to TACACS server failed.
Why I did it
With these messages, we can downgrade TACACS server with issue to lower priority.
Work item tracking
Microsoft ADO: 24667696
How I did it
Write error message to syslog when add user failed or connect to TACACS server failed.
How to verify it
Pass all UT.
Manually verify error message generated.
Why I did it
Drop for 8111-32EH-O:
Fix for clear_trap_configuration errors
Fix OREDERED ECMP NHG drop when route is added before members are added
Fix port handling of empty ecmp group to drop packets
Fix for link_notification_handle error
Auto FPD upgrade support
Work item tracking
Microsoft ADO (number only):
How I did it
update platform to 202305.1.0.1
#### Why I did it
src/sonic-gnmi
```
* a49ca56 - (HEAD -> 202305, origin/202305) Merge pull request #167 from zbud-msft/cherry-pick-fix-panic-202305 (11 hours ago) [StormLiangMS]
* 6ba1125 - Merge branch '202305' into cherry-pick-fix-panic-202305 (2 weeks ago) [Zain Budhwani]
* 3a0fbb9 - Fix build error (2 weeks ago) [Zain Budhwani]
* 7fad847 - Recover from potential panic when doing map to JSON serialization (#161) (2 weeks ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* e7325db - (HEAD -> 202305, origin/202305) Fix SSD health percentage issue for vendor Virtium (#407) (#408) (11 hours ago) [Stephen Sun]
* 87e33ab - [Credo][Ycable] Remove the thread locker protection from the thread-safe APIs (#388) (11 hours ago) [Xinyu Lin]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 5a052ed - (HEAD -> 202305, origin/202305) [warmboot] Add workaround for `INIT_VIEW` failure (#1252) (11 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Add the create_only_config_db_buffers attribute to the DEVICE_METADATA|localhost. If the "create_only_config_db_buffers" exists and is equal to "true" - the buffers will be created according to the config_db configuration (for example BUFFER_QUEUE|* table), otherwise the maximum available buffers (which are read from SAI) will be created, regardless of the CONFIG_DB buffers configuration.
Work item tracking
Microsoft ADO (number only):
How I did it
Add the create_only_config_db_buffers.json files for Mellanox devices (not MSFT SKU's), and inject the content to the CONFIG_DB during the swss docker container start.
How to verify it
Manual verification:
Install the image with this PR included on the not MSFT SKU switch
Check the show queue counters output and verify that only configured in CONFIG_DB buffers are created
root@sonic:/home/admin# show queue counters
Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes
--------- ----- -------------- --------------- ----------- ------------
Ethernet0 UC0 0 0 0 N/A
Ethernet0 UC1 0 0 0 N/A
Ethernet0 UC2 0 0 0 N/A
Ethernet0 UC3 0 0 0 N/A
Ethernet0 UC4 0 0 0 N/A
Ethernet0 UC5 0 0 0 N/A
Ethernet0 UC6 0 0 0 N/A
Open the /usr/share/sonic/device/$DEVICE/$SKU/create_only_config_db_buffers.json and change it to:
"create_only_config_db_buffers": "false"
Do config reload
Check the show queue counters output and verify that all available buffers are created
root@sonic:/home/admin# show queue counters
Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes
--------- ----- -------------- --------------- ----------- ------------
Ethernet0 UC0 0 0 0 N/A
Ethernet0 UC1 0 0 0 N/A
Ethernet0 UC2 0 0 0 N/A
Ethernet0 UC3 0 0 0 N/A
Ethernet0 UC4 0 0 0 N/A
Ethernet0 UC5 0 0 0 N/A
Ethernet0 UC6 0 0 0 N/A
Ethernet0 UC7 60 15346 0 N/A
Ethernet0 MC8 N/A N/A N/A N/A
Ethernet0 MC9 N/A N/A N/A N/A
Ethernet0 MC10 N/A N/A N/A N/A
Ethernet0 MC11 N/A N/A N/A N/A
Ethernet0 MC12 N/A N/A N/A N/A
Ethernet0 MC13 N/A N/A N/A N/A
Ethernet0 MC14 N/A N/A N/A N/A
Ethernet0 MC15 N/A N/A N/A N/A
Why I did it
To improve FAST reboot dataplane downtime
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests of fastboot
#### Why I did it
src/sonic-swss
```
* 5bee57a4 - (HEAD -> 202305, origin/202305) Fix data race in on_switch_shutdown_request() (#2931) (16 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 569beb19 - (HEAD -> 202305, origin/202305) Revert "Remove syslog service validator in GCU (#2991)" (#3015) (16 hours ago) [jingwenxie]
* ab7f03ea - [db_migrator] Fix the broken version chain (#3014) (16 hours ago) [Vivek]
* 0f17b8d5 - [fwutil] Fix python SyntaxWarning for 'is' with literals (#3013) (16 hours ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Submodule update for sonic-restapi
ccad4a2 - 2023-10-17 : [Tunnel] Support co-existence of IPv4 and IPv6 tunnels (#147) [Prince Sunny]
c8fa96b - 2023-10-12 : Remove command to install libhiredis deb file (#146) [Saikrishna Arcot]
Work item tracking
Microsoft ADO 25072916:
How I did it
How to verify it
Why I did it
In an effort to allow people to build a slim version of SONiC to fit on devices to small storage, there is a need to disable some unneeded features.
The docker-gbsyncd are only applicable to devices with external gearboxes and might not apply to devices that need a small image.
It is therefore desirable to have a knob to not include these gbsyncd containers.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a new config INCLUDE_GBSYNCD which is enabled by default to retain the previous behavior.
Setting it to n will not include the platform/components/docker-gbsyncd-*.mk.
How to verify it
Set INCLUDE_GBSYNCD = n and witness that docker-gbsyncd images are not present in the final image.
### Why I did it
##### Work item tracking
- Microsoft ADO **(number only)**:24851367
#### How I did it
Read subscription message when capture service starts, before reading cached events.
#### How to verify it
UT/Manual testing
Why I did it
Running SONiC releases past 202012 has become really challenging on system with small storage devices (4GB).
Some of these devices can also be limited by only having 4GB of RAM which complicates mitigations.
The main contributor to these issues is the SONiC image growth.
Being able to reduce it by some decent amount should allow these systems to run SONiC longer.
It would also reduce some impacts related to space savings mitigations.
Work item tracking
Microsoft ADO (number only):
How I did it
Add a build option to reduce the image size.
The image reduction process is affecting the builds in 2 ways:
change some packages that are installed in the rootfs
apply a rootfs reduction script
The script itself will perform a few steps:
remove file duplication by leveraging hardlinks
under /usr/share/sonic since the symlinks under the device folder are lost during the build.
under /var/lib/docker since the files there will only be mounted ro
remove some extra files (man, docs, licenses, ...)
some image specific space reduction (only for aboot images currently)
The script can later be improved but for now it's reducing the rootfs size by ~30%.
How to verify it
Compare the size of an image with this option enabled and this option enabled.
Expect the fully extracted content to be ~30% less.
Which release branch to backport (provide reason below if selected)
This is a backport of #16729
Description for the changelog
Add build option to reduce final image size
Why I did it
Update the submodule to include the following fixes
2b33d76 dhcpv6 per interface counter support
6a6ce24 fix dhcpv6 relay dual tor source interface selection issue
c36b8e3 [actions] Support Semgrep by Github Actions (#39)
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Why I did it
Update the kernel to 5.10.179 for the 202305 branch
Work item tracking
Microsoft ADO (number only): 24592132
How I did it
How to verify it
Why I did it
Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image.
Work item tracking
Microsoft ADO (17271822):
How I did it
- Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo
- Show CLI changes after counter format change
How to verify it
- Manually run show command
- dhcpmon, dhcprelay integration tests
#### Why I did it
Fix issue #16533 , telemetry service exit in master and 202305 branches due to no telemetry configs in redis DB.
#### How I did it
Enable default config if no TELEMETRY configs from redis DB.
#### How to verify it
After the fix, telemetry service would work with the following two scenarios:
1. With TELEMETRY config in redis DB, load service configs from DB.
2. No TELEMETRY config in redis DB, use default service configs.
### Why I did it
[Security] Upgrade the OpenSSL/OpenSSH to fix CVE alerts
Upgrade OpenSSL to 1.1.1n-0+deb11u5
Fix CVEs:
CVE-2023-0464 (Excessive Resource Usage Verifying X.509 Policy
CVE-2023-0465 (Invalid certificate policies in leaf certificates are
CVE-2023-0466 (Certificate policy check not enabled).
CVE-2022-4304 (Timing Oracle in RSA Decryption).
CVE-2023-2650 (Possible DoS translating ASN.1 object identifiers).
Upgrade OpenSSH to 8.4p1-5+deb11u2
Fix CVEs:
CVE-2023-38408 (Lacks SSH agent restriction)
##### Work item tracking
- Microsoft ADO **(number only)**: 25506776
#### How I did it
Upgrade the OpenSSL/OpenSSH package version and fix the UT failure.
#### How to verify it
Verified by UTs with and without FIPS enabled.
* Remove main deb installation for derived deb build (#16859)
* Don't install dependencies of derived debs
When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.
* Re-add openssh-client and openssh-sftp-server as derived debs
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 9ae77bc2dd)
* Re-add missing dependency for derived debs. (#16896)
* Re-add missing dependency for derived debs.
My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 963d40a77b)
* [build] Fix build issue in docker-ptf-sai caused by setuptools_scm new release (#16636)
docker-ptf-sai build fails on setuptools_scm's new release on 09/20/2023.
Use old version instead.
(cherry picked from commit bfa05c8349)
---------
Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
Why I did it
Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf.
After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot.
How I did it
Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms.
How to verify it
Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot.
Kdump config should persist on subsequent reboots and kdump loaded during bootup
Signed-off-by: Aman Singhal <amans@cisco.com>
Created patches to address two CVEs from FRR CVE-2023-41358 and CVE-2023-38802.
Patch FRR commit CVE fixed
0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch FRRouting/frr@f291f1e CVE-2023-41358
0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch FRRouting/frr@8a4a88c CVE-2023-38802
Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.
How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.
How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
On some products from this line one of the management NIC might be unpopulated.
On such products this leads to errors from pcied and pcie-check.sh
How I did it
Remove this PCIe device from pcie.yaml
How to verify it
Run pcieutil check on the 2 hardware variants and validate that it passes.
Restart pcied and make sure that there is no more error logs in the syslog.
ADO: 25447788
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408.
Since we're building openssh with some patches, we need to update our version as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
previously, get_num_asics() returns the maximum number of asics. however, the asic_count
should be actual number of asics populated which can be get from get_asic_presence_list().
ADO: 25158825
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-swss
```
* fc63383b - (HEAD -> 202305, origin/202305) [ppi]: Implement port bulk comparison logic (#2921) (2 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-host-services
```
* fc88254 - (HEAD -> 202305, origin/202305) Support to config fips state (#69) (#78) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
sonic-host-services depends on sonic-utilities because of FIPS feature.
Add dependency to unblock submodule sonic-host-services HEAD pointer update.
Work item tracking
Microsoft ADO (number only): 24671218
How I did it
### Why I did it
When FRR is built with Cache enabled, the build failed with the following error logs
```
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/grpc_basic/lib
[2023-09-20T15:17:00.273Z] fatal: Unable to hash src/sonic-frr/frr/tests/topotests/ospfapi/lib
[2023-09-20T15:17:00.273Z] make: *** [Makefile.cache:528: target/debs/bullseye/frr_8.5.1-sonic-0_amd64.deb.smdep] Error 123
[2023-09-20T15:17:00.273Z] make: *** Waiting for unfinished jobs....
```
#### How I did it
Currently symlinks are excluded in hardcoded fashion. With FRR upgrades new symlinks might get introduced. To overcome it modified the way in which symlinks are excluded by finding symlinks using find command
#### How to verify it
Build FRR with cache enabled
Why I did it
Now build will fail on:
fatal: Unable to hash src/sonic-frr/frr/tests/topotests/grpc_basic/lib
fatal: Unable to hash src/sonic-frr/frr/tests/topotests/ospfapi/lib
make: *** [Makefile.cache:528: target/debs/buster/frr_8.5.1-sonic-0_amd64.deb.smdep] Error 123
make: *** Waiting for unfinished jobs....
Root cause is that these files are symbol links.
git hash-object can't hash symbol links.
Work item tracking
Microsoft ADO (number only): 25271730
How I did it
These two files are symbol links.
When calculate sha value, skip these two files.
#### Why I did it
src/sonic-linux-kernel
```
* e262947 - (HEAD -> 202305, origin/202305) Revert "Update to Linux 5.10.179 (#328)" (19 hours ago) [stormliang]
* e64669d - Update to Linux 5.10.179 (#328) (2 days ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)"
This reverts commit ebe8c8c223.
* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15874)"
This reverts commit 83aa8b8180.
#### Why I did it
src/sonic-gnmi
```
* df4d49f - (HEAD -> 202305, origin/202305) Install necessary debs instead of entire artifact in azp (#137) (12 hours ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
First SONIC 202305 based release
Includes all fixes so far up to latest 202205 based 8111 drop (Code drop 111: 202205.main.0.13)
Work item tracking
Microsoft ADO (number only):
How I did it
update to 202305.main.0.1 release
How to verify it
BACKPORT: #16781
Why I did it
To enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700
Work item tracking
N/A
How I did it
Added vendor SAI config options
How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Sfp api can now be called from the host which doesn't have the python_sdk_api installed. Also, sfp api has been migrated to use sysfs instead of sdk handle.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
#### Why I did it
src/sonic-swss
```
* 8934b62b - (HEAD -> 202305, origin/202305) [202305][CodeQL]: Use dependencies with relevant versions in azp template. (#2906) (3 hours ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 0584d35b - (HEAD -> 202305, origin/202305) Revert "Support type7 encoded CAK key for macsec in config_db (#2892)" (3 minutes ago) [stormliang]
* 7097cf2b - Revert "[teamd]: Clean teamd process if LAG creation fails (#2888)" (3 days ago) [stormliang]
* a0eb0d07 - Support type7 encoded CAK key for macsec in config_db (#2892) (4 days ago) [judyjoseph]
* c7e5f10e - [teamd]: Clean teamd process if LAG creation fails (#2888) (4 days ago) [Lawrence Lee]
* f30b6107 - [CodeQL]: Use dependencies with relevant versions in azp template. (#2845) (4 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
2. Dual ToR Active-Standby | Additional MAC support
SDK/FW bug fixes
1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
- How I did it
Update SAI version to SAIBuild2211.25.1.4
Update SDK/FW version to 4.6.1062/2012.1062
* [swss] Chassis db clean up optimization and bug fixes
This commit includes the following changes:
- Fix for regression failure due to error in finding CHASSIS_APP_DB in
pizzabox (#PR 16451)
- After attempting to delete the system neighbor entries from
chassis db, before starting clearing the system interface entries,
wait for sometime only if some system neighbors were deleted.
If there are no system neighbors entries deleted for the asic coming up,
no need to wait.
- Similar changes for system lag delete. Before deleting the
system lag, wait for some time only if some system lag memebers were
deleted. If there are no system lag members deleted no need to wait.
- Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
is coming up, when system neigh entries are deleted from chassis ap
db (as part of chassis db clean up), there is no orchs/process running to
process the delete messages from chassis redis. Because of this, stale system
neigh are entries present in the local STATE_DB. The stale entries result in
creation of orphan (no corresponding data path/asic db entry) kernel neigh
entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
the local STATE_DB when sevice comes up.
Signed-off-by: vedganes <veda.ganesan@nokia.com>
* [swss] Chassis db clean up bug fixes review comment fix - 1
Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)
Signed-off-by: vedganes <veda.ganesan@nokia.com>
---------
Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
Stop installing development packages from telemetry docker images to avoid unnecessary space usage.
### Why I did it
From 202305, libswsscommon-dev and the Boost headers were brought in telemetry docker image incorrectly, which result in unnecessary space usage.
##### Work item tracking
- Microsoft ADO **(number only)**:25176224
#### How I did it
Remove libswsscommon-dev accordingly.
#### How to verify it
Image building.
Signed-off-by: anamehra anamehra@cisco.com
Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
In #15080, there was a command added to re-add 127.0.0.1/8 to the lo
interface when the networking configuration is being brought down.
However, the trigger for that command is `down`, which, looking at
ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is
removed. This means there may be a period of time where there are no
loopback addresses assigned to the lo interface, and redis commands will
fail.
Fix this by changing this to pre-down, which should run well before
127.0.0.1/16 is removed, and should always leave lo with a loopback
address.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
* Change the CAK key length check in config plugin, macsec test profile changes
* Fix the format in add_profile api
The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier.
Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>
- Why I did it
Because the Spectrum4 devices don't support mlxtrace utility.
- How I did it
Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
#### Why I did it
To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI".
This deployment has the following requirement:
- Config below enabled if DEVICE_TYPE as one of backend_device_types
- Config below enabled if ResourceType is 'Compute-AI'
- 2 lossless TCs' (2, 3)
- 2 lossy TCs' (0,1)
- DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows:
"DSCP_TO_TC_MAP": {
"AZURE": {
"48" : "0",
"46" : "1",
"3" : "3",
"4" : "4"
}
}
- WRED profile has green {min/max/mark%} as {2M/10M/5%}
This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here).
### How I did it
#### How to verify it
- with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met
- verified no error
### Description for the changelog
Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles
- Why I did it
1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011
2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms
3. Support Spectrum-4 systems
- How I did it
1. Update the HW-MGMT package version number and submodule pointer
2. Remove the thermal control algorithm implementation from Mellanox platform API
3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX
4. Update the downstream kernel patch list
Signed-off-by: Kebo Liu <kebol@nvidia.com>
How I did it
Update Yang definition of ACL_TABLE_TYPE.
Update existing testcase.
Add new testcase to cover lowercase key scenario.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.
We now disable the i801 driver interrupt and instead enable polling
Microsoft ADO (number only): 24910530
How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver
How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-
- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3
Signed-off-by: Prince George <prgeor@microsoft.com>
Why I did it
sonic-mgmt test failure is seen for update_firmware component API
Microsoft ADO: 25208748
How I did it
Edited API 2.0 to fix this issue.
How to verify it
Run sonic-mgmt test after the fix and verify it passes.
* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
platform/mellanox/mlnx-sai.mk
* Fix issue: unprintable character is rendered when handling comments in j2
Use "{#-" and "-#}" to mark comments in jinja template
Signed-off-by: Stephen Sun <stephens@nvidia.com>
---------
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
src/sonic-linux-kernel
```
* 9cb7ea0 - (HEAD -> 202305, origin/202305) arm64: dts: marvell: Add Nokia 7215-IXS-A1 board (#321) (24 hours ago) [Pavan-Nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Advance dhcpmon to a3c5381 in 202305 branch.
a3c5381 - (HEAD, origin/master, origin/HEAD, master) Merge pull request src: Add libnl3 build.sh script #11 from jcaiMR/dev/jcai_fix_err_log (11 days ago) [StormLiangMS]
c5ef7e7 - Change common_libs dependencies from buster to bullseye (Updating docker-orchagent/syncd Dockerfile and start.sh #9)
824a144 - replace atoi with strtol (Rename hostname #6) (10 weeks ago) [Mai Bui]
32c0c3f - Fix libswsscommon package installation for non-amd64 (README.md leaves out docker-database #7) (10 weeks ago) [Saikrishna Arcot]
Work item tracking
Microsoft ADO (25048723):
How I did it
How to verify it
Run test_dhcp_relay.py, no failure
- Why I did it
Fixed build failure when flag ENABLE_SFLOW_DROPMON=y set
- How I did it
Fixed sflow dropmon patch to align with hsflowd version 2.0.45
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.
Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)
Signed-off-by: mlok <marty.lok@nokia.com>
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
It appears that this was initially added to provide the git-retry
command (which doesn't appear to be used today). However, this repo is
now also providing bazel (which is actually used in our build today),
and this command (along with git-retry) expects some vpython3 binary to
be set up/installed.
Rather than going through that, just get rid of this repo.
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Bmc is a valid neighbor type in minigraph, however it was missing from the YANG model definition. Usually, the Bmc type device can be neighbor of BmcMgmtToRRouter. This PR is to introduce this type.
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
Why I did it
According to ACL-Table-Type-HLD, the value type of MATCHES, ACTIONS and BIND_POINTS should be list instead of string. Opening this PR to update the definition of BMCDATA and BMCDATAV6.
How I did it
Update the definition of BMCDATA and BMCDATAV6 in minigraph-parser.
How to verify it
Verified by UT and build SONiC image.
#### Why I did it
src/sonic-swss
```
* c869c1df - (HEAD -> 202305, origin/202305) update portStatIds for cisco (#2876) (11 minutes ago) [Zhixin Zhu]
* dd152288 - [Dynamic Buffer][Mellanox] Skip PGs in pending deleting set while checking accumulative headroom of a port (#2871) (12 minutes ago) [Stephen Sun]
* 97068ff1 - Fix error in peer response time when headroom is calculated for 800G (#2860) (16 minutes ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
= Why I did it
To optimize Mellanox platform SAI build
- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.
- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
#### Why I did it
To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129).
There could be a scenario before the reboot, where
1. The `docker service` has stopped
2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry`
In such scenario, the `memory_checker` script will throw an error to the syslog:
```
ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'
```
But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog.
#### How I did it
Change the log severity to the warning and changed the return value.
#### How to verify it
It is really hard to catch the exact moment described in the `Why I did it` section.
In order to check the logic:
1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](47742dfc2c/files/image_config/monit/memory_checker (L139)) file on the switch.
2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry`
3. Check the syslog for such messages:
```
WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte
d.', FileNotFoundError(2, 'No such file or directory'))'
INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
```
Why I did it
Few commands in multiasic platforms when run with the "sudo ip netns exec asic0 " option was taking like 15 mins to get the o/p. This behavior of sudo getting hung was seen by just doing this
jujoseph@svcstr-server-2:~ sudo ip netns exec asic0 bash
jujoseph@svcstr-server-2:~ sudo ls
deally sudo is not needed as we have /bin/ip netns identify present in /etc/sudoers file. Hence removing it
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How to verify it
Manual test
sonic-mgmt test_sensors.py
Why I did it
Support FIPS DB configuration
Design Doc: sonic-net/SONiC#1372
Work item tracking
Microsoft ADO (number only): 24411148
How I did it
Add the FIPS Yang model to make FIPS configurable in ConfigDB.
How to verify it
See TestPlan: sonic-net/sonic-mgmt#9092
Build the image and run the tests: sonic-net/sonic-mgmt#9091
- Why I did it
Add new breakout modes to be used in PAM4 supported cables
- How I did it
- How to verify it
Verified the 50G per lane breakout modes are applied properly on the switch
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Enabled port late create on SN5600 Spectrum-4 switch boots up with no ports
Work item tracking
N/A
- How I did it
Updated SAI xml config file
- How to verify it
Run sonic-mgmt tests of fastboot
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.
- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs
- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Why I did it
docker folder size on 202305 image is more than 1.5G. larger than the max size of docker ramfs size.
Work item tracking
Microsoft ADO (number only):
24969589
How I did it
Update the docker ramfs size from 1500M to 2500M
How to verify it
Boot 202305 image.
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:
Cherypick of #15685
MSFT ADO: 24274591
Why I did it
Two changes:
1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect.
There are multiple places where same incorrect logic is used.
Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation.
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~#
root@str2-7060cx-32s-29:~#
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~#
Fix this logic by checking for value of flag to be "1".
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.
2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case
Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot.
Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
#### Why I did it
src/linkmgrd
```
* 40113fd - (HEAD -> 202305, origin/202305) [active-standby] Fix extra toggle observed in `config reload` (#216) (2 days ago) [Longxiang Lyu]
* b6d40fc - Add ADO to the PR template (#215) (2 days ago) [Longxiang Lyu]
* fe41ad2 - [active-standby] Write `unhealthy` is default route `N/A` (#214) (2 days ago) [Longxiang Lyu]
* 8ff265c - [link prober] Increase pause/restart probe log verbosity (#213) (2 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* 4688e1e - (HEAD -> 202305, origin/202305) [PSU power threshold] Fix logic error: compare the system power with the PSU's power threshold (#367) (2 days ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 6b0c8c5c - (HEAD -> 202305, origin/202305) [muxorch] set mux state to init upon warm reboot (#2834) (18 hours ago) [Nikola Dancejic]
* bc9bde94 - [ASAN] Fix Indirect Mem Leaks in Orchagent (#2869) (24 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
fix static route uninstall issue when all nexthops are not reachable.
the feature was working but the bug was introduced when support dynamic bfd enable/disable. Added UT testcase to guard this.
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.
```
Aug 9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug 9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug 9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug 9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug 9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug 9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug 9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug 9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```
88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua
##### Work item tracking
- Microsoft ADO **24741990**:
#### How I did it
Change batch size from 8192 to 1024.
#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.
### Tested branch (Please provide the tested image version)
- [x] 20220531.36
- Why I did it
Adjust PSU power threshold logic in system health.
- How I did it
Update the description message in PSU power threshold checking
power of PSU x (xx w) exceeds threshold (xx w) => System power exceeds xx threshold (xx w)
- How to verify it
Manual test and unit test
What I did:
Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer devices.
Why I did:
For Ipv6 Loopback0 address we only advertise /64 subnet to the peer devices. However, in case of chassis each LC will have it own /128 address of that /64 subnet . Since this /128 address does not get advertised peer devices can-not ping/reach the LC's loopback0.
How I fix:
Advertise /128 Loopback0 Ipv6 address only between i-BGP peers. This way even though /64 is advertised to e-BGP peer devices when packet reaches any of LC's it can reach the appropriate LC's.
How I verify:
Manual verification
UT added for same.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
get_system_mac was returning 'None' mac for system without eeprom.
get_system_mac for marvell platform checks for mac in eeprom, profile.ini(hwsku file) and eth0. Check for valid mac returned by syseeprom was incorrect. Which was resulting in bypassing mac get from profile.ini and eth0.
How I did it
get_system_mac already has a logic to get first valid mac.
Removed null check for mac returned by eeprom.
Corrected the check for profile.ini file by checking if file exist.
How to verify it
Executed sonic-cfggen to check valid mac address is getting configured in config_db.json with/without profile.ini.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
cherry-pick: #14513
depends: https://github.com/sonic-net/sonic-utilities/pull/2939
* Add an ability to configure remote syslog servers
* Add an initial configuration for remote syslog
* Extend YANG module and add unit tests
#### Why I did it
Adding the following functionality to rsyslog feature:
* Configure remote syslog servers: protocol, filter, severity level
* Update global syslog configuration: severity level, message format
#### How I did it
added parameters to syslog server and global configuration.
#### How to verify it
create syslog server using CLI/adding to Redis-DB
verify server is added to file /etc/rsyslog.conf and server is functional.
#### Description for the changelog
extend rsyslog capabilities, added server and global configuration parameters.
#### Link to config_db schema for YANG module changes
[sonic-syslog.yang](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-syslog.yang)
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.
How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64
How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955
Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
- Why I did it
Increase UT coverage for Nvidia platform API code
Work item tracking
Microsoft ADO (number only):
- How I did it
Focus on low coverage file:
1. component.py
2. watchdog.py
3. pcie.py
- How to verify it
Run the unit test, the coverage has been changed from 70% to 90%
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
#### Why I did it
event yang models for usage currently use int as type for usage leaf, needs to be of type decimal64
##### Work item tracking
- Microsoft ADO **(number only)**:17747466
#### How I did it
Update yang models and UT
#### How to verify it
UT
Co-authored-by: Zain Budhwani <99770260+zbud-msft@users.noreply.github.com>
Fixes#15667 and #13293
Work item tracking
Microsoft ADO 24472854:
How I did it
On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled.
How to verify it
Tests on chassis supervisor and LC
Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx", or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487
The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
"MGMT_INTERFACE": {
"eth0|10.250.0.101/24": {
"forced_mgmt_routes": [
"172.17.0.1/24"
],
"gwaddr": "10.250.0.1"
},
"eth0|fe80::5054:ff:fe6f:16f0/64": {
"gwaddr": "fe80::1"
}
},
Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp 0 0 127.0.0.1:3161 0.0.0.0:* LISTEN 274060/snmpd
udp 0 0 10.1.0.32:161 0.0.0.0:* 274060/snmpd
udp 0 0 10.250.0.101:161 0.0.0.0:* 274060/snmpd
udp6 0 0 fc00:1::32:161 :::* 274060/snmpd
udp6 0 0 fe80::5054:ff:fe6f::161 :::* 274060/snmpd -- Link local
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.250.0.101 netmask 255.255.255.0 broadcast 10.250.0.255
inet6 fe80::5054:ff:fe6f:16f0 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:6f:16:f0 txqueuelen 1000 (Ethernet)
RX packets 36384 bytes 22878123 (21.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 261265 bytes 46585948 (44.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..
snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305
#### Tested branch (Please provide the tested image version)
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->
- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
<!--
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
#### Why I did it
src/sonic-platform-common
```
* 5af6f9f - (HEAD -> 202305, origin/202305) Comment out tx power validation check and program the passed value (#389) (3 days ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 27b64579 - (HEAD -> 202305, origin/202305) Remove system neighbor DEL operation in m_toSync if SET operation for (#2853) (3 days ago) [Song Yuan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss-common
```
* 449ac55 - (HEAD -> 202305, origin/202305) [Ci] Fix collect log error in azp template (#799) (2 days ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* bf1ee0e - (HEAD -> 202305, origin/202305) Fix Makefile syntax and provide default value for CONFIGURED_PLATFORM (#324) (13 hours ago) [Saikrishna Arcot]
* 7d7abaf - Update codeowner and build info (#319) (13 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Fix some of the patches in .patches folder not applied issue.
The command "quilt applied" only lists the applied patches, if some of the patches have issues, then the patches will not be applied when you run the build command again.
Work item tracking
Microsoft ADO (number only): 24410730
How I did it
Run the command to apply the patches without any conditions.
If failed, check if the failure reason is "series fully applied".
How to verify it
#### Why I did it
src/sonic-platform-daemons
```
* 6c47906 - (HEAD -> 202305, origin/202305) Update active application selected code in transceiver_info table aft… (#381) (13 hours ago) [Michael Wang - TW]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
When a kernel crash occurs, the system will reboot to the kdump capture kernel if kdump is enabled (`config kdump enable`). In the kdump capture boot, it only stores the crash information, and then reboot the system to a normal boot.
In this boot, no SONiC service is started but it invokes `reboot` which is actually the SONiC reboot that depends on SONiC services. There is a logic to skip all SONiC stuff and invoke platform reboot in SONiC reboot to avoid issues.
However, on Nvidia platforms, the platform reboot still depends on SONiC services, which can cause issues.
So, the Debian reboot is called directly in platform reboot if it is invoked from the kdump capture boot.
#### How I did it
Manual test
Why I did it
Sharing the storage of syncd with other proprietary application extensions allows them to communicate with syncd in differnt ways.
If one container wants to pass some information to syncd then shared storage can be used. However, today the shared storage isn't cleaned on restarts making it possible for syncd to read out-of-date information generated in the past.
NOTE: No plans to use it for standard SONIC dockers and we are working on removing the SDK dependency from PMON docker
How I did it
Implemented new service to clean the shared storage.
How to verify it
Do reboot/fast-reboot/warm-reboot/config-reload/systemctl restart swss and verify /tmp/ is cleaned after each restart in syncd container.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
Added support data for fabric monitoring in CONFIG_DB
The CONFIG_DB now has the FABRIC_MONITOR|FABRIC_MONITOR_DATA table for default value for fabric port monitoring. An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_MONITOR|FABRIC_MONITOR_DATA"
{'monErrThreshCrcCells': '1', 'monErrThreshRxCells': '61035156', 'monPollThreshIsolation': '1', 'monPollThreshRecovery': '8'}
The CONFIG_DB now also has a table for each fabric port for its isolate status.
An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_PORT|Fabric20"
{'alias': 'Fabric20', 'isolateStatus': 'False', 'lanes': '20'}
Co-authored-by: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com>
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "delayed=False". However, we know that PMON has a timer now. So, I try to fix it here.
Why I did it
fix pcied leak on chassis
fix fan status led setting on fixed systems
misc fixes
Work item tracking
Microsoft ADO (number only):
How I did it
Updated arista platform library submodules
Description for the changelog
Update Arista platform submodules
Why I did it
Use remote PR test template from sonic-mgmt master to run PR test.
How I did it
Modify PR test azure pipeline yml file.
How to verify it
PR test executing normally.
Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
#### Why I did it
src/sonic-platform-common
```
* 411d5b2 - (HEAD -> 202305, origin/202305) More prevention of fatal exception caused by VDM dictionary missing fields when a transceiver has just been pulled (#376) (2 days ago) [snider-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* fa342305 - (HEAD -> 202305, origin/202305) Remove redundant updateFabricPortState (#2850) (18 hours ago) [kenneth-arista]
* c571d8bf - Allow NOT_IMPLEMENTED sai return status for availability monitoring API (#2848) (18 hours ago) [Tejaswini Chadaga]
```
#### How I did it
#### How to verify it
#### Description for the changelog
src/sonic-py-swsssdk
* 1109e49 - (HEAD -> 202305, origin/master, origin/HEAD, origin/202305, master) add semgrep (#141) (4 weeks ago) [Mai Bui]
How I did it
How to verify it
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.
more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211
#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
- What I did
Added support for secure upgrade.
- How I did it
During sonic_installer install, added secure upgrade image verification.
HLD can be found in the following PR: sonic-net/SONiC#1024
- Why I did it
Feature is used to allow image was not modified since built from vendor. During installation, image can be verified with a signature attached to it.
- How I did it
Feature includes image signing during build (in sonic buildimage repo) and verification during image install (in sonic-utilities).
- How to verify it
In order for image verification - image must be signed - need to provide signing key and certificate (paths in SECURE_UPGRADE_DEV_SIGNING_KEY and SECURE_UPGRADE_DEV_SIGNING_CERT in rules/config) during build , and during image install, need to enable secure boot flag in bios, and signing_certificate should be available in bios.
- Feature dependencies
In order for this feature to work smoothly, need to have secure boot feature implemented as well.
The Secure boot feature will be merged in the near future.
Co-authored-by: ycoheNvidia <99744138+ycoheNvidia@users.noreply.github.com>
#### Why I did it
Failed to build sonic-dhcp6relay_1.0.0-0_amd64.deb
#### How I did it
src/dhcprelay has git submodule.
Dependency files by "git ls-files" are not picked files in submodules.
Add --recurse-submodules, work again.
#### How to verify it
make all
Why I did it
Fix the armhf build failure.
How to reproduce the issue:
docker run -it debain:bullseye bash
apt-get update && apt-get install -y python3-pip
pip3 install PyYAML==5.4.1
Error message:
Collecting PyYAML==5.4.1
Downloading PyYAML-5.4.1.tar.gz (175 kB)
|████████████████████████████████| 175 kB 12.3 MB/s
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl
....
raise AttributeError(attr)
AttributeError: cython_sources
----------------------------------------
WARNING: Discarding d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1
ERROR: No matching distribution found for PyYAML==5.4.1
root@fa2fa92edcfd:/#
But if adding the option --no-build-isolation, then it is good, see fix.
install "PyYAML==5.4.1" --no-build-isolation
The same error can be found in the multiple builds.
Work item tracking
Microsoft ADO (number only): 24567457
How I did it
Add a build option --no-build-isolation.
Disable isolation when building a modern source distribution. Build dependencies specified by PEP 518 must be already installed if this option is used.
How to verify it
Why I did it
To reduce the container's dependency from host system
Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.
How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.
Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Why I did it
It is to fix the docker-ptf-sai build failure.
https://dev.azure.com/mssonic/build/_build/results?buildId=311315&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795
2023-07-09T13:53:19.9025355Z �[91mTraceback (most recent call last):
2023-07-09T13:53:19.9025715Z File "/root/ptf/.eggs/setuptools_scm-7.1.0-py3.7.egg/setuptools_scm/_entrypoints.py", line 74, in <module>
2023-07-09T13:53:19.9025933Z from importlib.metadata import entry_points # type: ignore
2023-07-09T13:53:19.9026167Z ModuleNotFoundError: No module named 'importlib.metadata'
Work item tracking
Microsoft ADO (number only): 24513583
How I did it
How to verify it
#### Why I did it
src/sonic-gnmi
```
* d1467d3 - (HEAD -> 202305, origin/202305) Update makefile to support armhf (#132) (#133) (5 days ago) [ganglv]
* 88ee65d - [202305] Checkout correct branch from sonic-mgmt-common and sonic-swss-common during pipeline build (#128) (5 days ago) [Sachin Holla]
* 87d8eb3 - TranslClient: use PathValidator to sanitize the request paths (#112) (4 weeks ago) [Sachin Holla]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Refine PR test template format.
How I did it
Refine PR test template format.
How to verify it
PR test executed normally.
Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
Why I did it
Downgrade the symcrypt version, use the SymCrypt version v103.0.1 for certification.
Work item tracking
Microsoft ADO (number only): 24222567
How I did it
How to verify it
Why I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
Work item tracking
Microsoft ADO (number only): 24101023
How I did it
Update the definition of acl table type BMCDATA and BMCDATAV6 in minigraph parser.
How to verify it
Ran unittest to verify this update:
- Why I did it
To fix hiredis compilation
- How I did it
Changed package version: 0.14.0-3~bpo9+1 -> 0.14.1-1
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
Why I did it
To add branch info for submodules of 202305 branch
Work item tracking
Microsoft ADO (21450860):
How I did it
To modify the .gitmodule file to have 202305 branch for repos
How to verify it
Pass PR build
2023-06-16 11:45:38 +00:00
1270 changed files with 171236 additions and 16107 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.