* [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2305.25.0.3 (#16096)
SONiC changes:
1. Support Spectrum4 ASIC FW binary building.
2. Support new SDK sx-obj-desc lib building since new SAI need it.
3. Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
4. Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2305.25.0.3
SDK/FW bug fixes
1. In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
2. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
1. On SN2700 all ports can support y cable by credo
SAI bug Fixes
1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
1. Port init profile
- How I did it
Update SDK/FW/SAI make files
- How to verify it
Run full sonic-mgmt regression on Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Conflicts:
platform/mellanox/mlnx-sai.mk
* Fix issue: unprintable character is rendered when handling comments in j2
Use "{#-" and "-#}" to mark comments in jinja template
Signed-off-by: Stephen Sun <stephens@nvidia.com>
---------
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
src/sonic-linux-kernel
```
* 9cb7ea0 - (HEAD -> 202305, origin/202305) arm64: dts: marvell: Add Nokia 7215-IXS-A1 board (#321) (24 hours ago) [Pavan-Nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Advance dhcpmon to a3c5381 in 202305 branch.
a3c5381 - (HEAD, origin/master, origin/HEAD, master) Merge pull request src: Add libnl3 build.sh script #11 from jcaiMR/dev/jcai_fix_err_log (11 days ago) [StormLiangMS]
c5ef7e7 - Change common_libs dependencies from buster to bullseye (Updating docker-orchagent/syncd Dockerfile and start.sh #9)
824a144 - replace atoi with strtol (Rename hostname #6) (10 weeks ago) [Mai Bui]
32c0c3f - Fix libswsscommon package installation for non-amd64 (README.md leaves out docker-database #7) (10 weeks ago) [Saikrishna Arcot]
Work item tracking
Microsoft ADO (25048723):
How I did it
How to verify it
Run test_dhcp_relay.py, no failure
- Why I did it
Fixed build failure when flag ENABLE_SFLOW_DROPMON=y set
- How I did it
Fixed sflow dropmon patch to align with hsflowd version 2.0.45
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.
Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)
Signed-off-by: mlok <marty.lok@nokia.com>
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:
admin@sonic:~$ sudo watchdogutil arm -s 100 =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm =======> watchdog instance3, armed=False
Failed to disarm Watchdog
- How I did it
Use sysfs to query watchdog status
- How to verify it
Manual test
Unit test
It appears that this was initially added to provide the git-retry
command (which doesn't appear to be used today). However, this repo is
now also providing bazel (which is actually used in our build today),
and this command (along with git-retry) expects some vpython3 binary to
be set up/installed.
Rather than going through that, just get rid of this repo.
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Bmc is a valid neighbor type in minigraph, however it was missing from the YANG model definition. Usually, the Bmc type device can be neighbor of BmcMgmtToRRouter. This PR is to introduce this type.
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
Why I did it
According to ACL-Table-Type-HLD, the value type of MATCHES, ACTIONS and BIND_POINTS should be list instead of string. Opening this PR to update the definition of BMCDATA and BMCDATAV6.
How I did it
Update the definition of BMCDATA and BMCDATAV6 in minigraph-parser.
How to verify it
Verified by UT and build SONiC image.
#### Why I did it
src/sonic-swss
```
* c869c1df - (HEAD -> 202305, origin/202305) update portStatIds for cisco (#2876) (11 minutes ago) [Zhixin Zhu]
* dd152288 - [Dynamic Buffer][Mellanox] Skip PGs in pending deleting set while checking accumulative headroom of a port (#2871) (12 minutes ago) [Stephen Sun]
* 97068ff1 - Fix error in peer response time when headroom is calculated for 800G (#2860) (16 minutes ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
= Why I did it
To optimize Mellanox platform SAI build
- How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.
- How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
#### Why I did it
To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129).
There could be a scenario before the reboot, where
1. The `docker service` has stopped
2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry`
In such scenario, the `memory_checker` script will throw an error to the syslog:
```
ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'
```
But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog.
#### How I did it
Change the log severity to the warning and changed the return value.
#### How to verify it
It is really hard to catch the exact moment described in the `Why I did it` section.
In order to check the logic:
1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](47742dfc2c/files/image_config/monit/memory_checker (L139)) file on the switch.
2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry`
3. Check the syslog for such messages:
```
WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte
d.', FileNotFoundError(2, 'No such file or directory'))'
INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
```
Why I did it
Few commands in multiasic platforms when run with the "sudo ip netns exec asic0 " option was taking like 15 mins to get the o/p. This behavior of sudo getting hung was seen by just doing this
jujoseph@svcstr-server-2:~ sudo ip netns exec asic0 bash
jujoseph@svcstr-server-2:~ sudo ls
deally sudo is not needed as we have /bin/ip netns identify present in /etc/sudoers file. Hence removing it
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How I did it
Revise lable name and fix typo in sensor.conf of 4600C
- How to verify it
Manual test
sonic-mgmt test_sensors.py
Why I did it
Support FIPS DB configuration
Design Doc: sonic-net/SONiC#1372
Work item tracking
Microsoft ADO (number only): 24411148
How I did it
Add the FIPS Yang model to make FIPS configurable in ConfigDB.
How to verify it
See TestPlan: sonic-net/sonic-mgmt#9092
Build the image and run the tests: sonic-net/sonic-mgmt#9091
- Why I did it
Add new breakout modes to be used in PAM4 supported cables
- How I did it
- How to verify it
Verified the 50G per lane breakout modes are applied properly on the switch
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it
Enabled port late create on SN5600 Spectrum-4 switch boots up with no ports
Work item tracking
N/A
- How I did it
Updated SAI xml config file
- How to verify it
Run sonic-mgmt tests of fastboot
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
- Why I did it
Change Mellanox platform API implementation to use ASIC driver sysfs for the module operational state and status error fields.
- How I did it
Modify the platform/mellanox/mlnx-platform-api/sonic_platform/sfp.py file by change the call of sx_mgmt_phy_module_info_get() SDK API to sysfs
- How to verify it
Simulate the unplug cable event
Check the CLI output
sfputil show presence
sfputil show error-status -hw
Simulate the plug cable event
Repeat 2 step
Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Why I did it
docker folder size on 202305 image is more than 1.5G. larger than the max size of docker ramfs size.
Work item tracking
Microsoft ADO (number only):
24969589
How I did it
Update the docker ramfs size from 1500M to 2500M
How to verify it
Boot 202305 image.
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:
Cherypick of #15685
MSFT ADO: 24274591
Why I did it
Two changes:
1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect.
There are multiple places where same incorrect logic is used.
Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation.
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~#
root@str2-7060cx-32s-29:~#
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~#
Fix this logic by checking for value of flag to be "1".
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.
2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case
Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot.
Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
#### Why I did it
src/linkmgrd
```
* 40113fd - (HEAD -> 202305, origin/202305) [active-standby] Fix extra toggle observed in `config reload` (#216) (2 days ago) [Longxiang Lyu]
* b6d40fc - Add ADO to the PR template (#215) (2 days ago) [Longxiang Lyu]
* fe41ad2 - [active-standby] Write `unhealthy` is default route `N/A` (#214) (2 days ago) [Longxiang Lyu]
* 8ff265c - [link prober] Increase pause/restart probe log verbosity (#213) (2 days ago) [Longxiang Lyu]
```
#### How I did it
#### How to verify it
#### Description for the changelog