* [Mellanox] Update HW-MGMT package to new version V.7.0030.1010
Signed-off-by: Kebo Liu <kebol@mellanox.com>
* Update hw-mgmt version to 7.0030.1011
Signed-off-by: Kebo Liu <kebol@nvidia.com>
---------
Signed-off-by: Kebo Liu <kebol@mellanox.com>
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Why I did it
When SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT changed, almost all dockers need to be built again.
But currently it will be loaded by cache.
Work item tracking
Microsoft ADO (number only): 25123348
How I did it
Add $(DOCKER)_FILES into dependencies.
How to verify it
* [swss] Chassis db clean up optimization and bug fixes
This commit includes the following changes:
- Fix for regression failure due to error in finding CHASSIS_APP_DB in
pizzabox (#PR 16451)
- After attempting to delete the system neighbor entries from
chassis db, before starting clearing the system interface entries,
wait for sometime only if some system neighbors were deleted.
If there are no system neighbors entries deleted for the asic coming up,
no need to wait.
- Similar changes for system lag delete. Before deleting the
system lag, wait for some time only if some system lag memebers were
deleted. If there are no system lag members deleted no need to wait.
- Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
is coming up, when system neigh entries are deleted from chassis ap
db (as part of chassis db clean up), there is no orchs/process running to
process the delete messages from chassis redis. Because of this, stale system
neigh are entries present in the local STATE_DB. The stale entries result in
creation of orphan (no corresponding data path/asic db entry) kernel neigh
entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
the local STATE_DB when sevice comes up.
Signed-off-by: vedganes <veda.ganesan@nokia.com>
* [swss] Chassis db clean up bug fixes review comment fix - 1
Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)
Signed-off-by: vedganes <veda.ganesan@nokia.com>
---------
Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)"
This reverts commit 803c71c86a.
* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487)"
This reverts commit 9864dfeaa1.
SAI bug Fixes
- When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
- Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
- Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SDK/FW bug fixes
- When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
Bmc is a valid neighbor type in minigraph, however it was missing from the YANG model definition. Usually, the Bmc type device can be neighbor of BmcMgmtToRRouter. This PR is to introduce this type.
Why I did it
For some devices whose log folder size is larger than 200M, for example, 256M, the LOG_FILE_ROTATE_SIZE_KB should be 16M. and
THRESHOLD_KB=$((USABLE_SPACE_KB - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (VAR_LOG_SIZE_KB * 90 / 100) - RESERVED_SPACE_KB)) - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (256M * 90 / 100) - 4096)) - (8 * 16M * 2)))
the result would be a negative value
Work item tracking
Microsoft ADO (number only):
24524827
How I did it
Add a case for 400M, if the log folder size is between 200M and 400M, set the log file size to 2M
How to verify it
Do cmd "sudo logrotate -f /etc/logrotate.conf" on DUT which val/log folder size is 256M, and check the syslog.
Why I did it
This is a fix for PR [kernel] Change grub cmdline to set c-states to 0 for "Intel" CPUs by shlomibitton · Pull Request #6051 · sonic-net/sonic-buildimage (github.com)
The original PR will disable intel idle driver but it cannot limit the max c-state to 1 due to system will fall back to acpi idle driver.
Currently intel_idle.max_cstate=0 is already present, which will disable intel idle driver. With the added option, common idle driver will be disabled as well, so there will not be idle management. This is to prevent a bug that can be triggered by idle instruction on intel platform.
How I did it
Add the option to installer file beside intel_idle.max_cstate=0
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
Why I did it
Few commands in multiasic platforms when run with the "sudo ip netns exec asic0 " option was taking like 15 mins to get the o/p. This behavior of sudo getting hung was seen by just doing this
jujoseph@svcstr-server-2:~ sudo ip netns exec asic0 bash
jujoseph@svcstr-server-2:~ sudo ls
deally sudo is not needed as we have /bin/ip netns identify present in /etc/sudoers file. Hence removing it
Why I did it
For security and consistency consideration, change the docker image from alpine to Debian in Makefile
Work item tracking
Microsoft ADO (number only): 23077660
How I did it
change the docker image from alpine to Debian in Makefile
Why I did it
Downgrade the symcrypt version, use the SymCrypt version v103.0.1 for certification.
Work item tracking
Microsoft ADO (number only): 24222567
How I did it
How to verify it
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back
- How I did it
Re-introduce the logic to run the db_migrator on non-first boots
- How to verify it
Verified reboot and warm-reboot cases
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
How I did it
Fix the regex for L4 port range in openconfig_acl.py.
How to verify it
Build image and install on Arista-720DT DUT, then try the repro steps in #16189 and confirmed the ACL rule be setup correctly:
#### Why I did it
src/sonic-sairedis
```
* 2ebbd48 - (HEAD -> 202211, origin/202211) [syncd] Add pre match logic for acl entry (#1240) (11 hours ago) [Kamil Cudnik]
* 1db8726 - Use SAI_STATUS_ITEM_NOT_FOUND when key not found (#1224) (11 hours ago) [Lawrence Lee]
* 9e4071b - [CI]: Fix collect log error in azp template. (#1282) (4 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* 10d7946 - (HEAD -> 202211, origin/202211) PATCH] net: allow user to set metric on default route learned via Router Advertisement (#326) (8 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
DEPENDS:
[202211][ppi]: Implement port bulk comparison logic (#2564) sonic-swss#2821
HLD: sonic-net/SONiC#1084
Why I did it
Enabled port late create on SN5600 switch boots up with no ports
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests fastboot
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
src/sonic-platform-common
```
* 05cf5c1 - (HEAD -> 202211, origin/202211) Change Y cable simulator log level from error to warning due to false alarm (11 hours ago) [ShiyanWangMS]
* 35ea290 - Update CMIS api's rendering max-duration (#375) (11 hours ago) [rajann]
* 33bd498 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (11 hours ago) [mihirpat1]
* 2434362 - Render Media lane and Media assignment options info from Application Code (#368) (11 hours ago) [rajann]
* 862674b - Modify sfputil show fwversion to include build version for active/inactive FW version fields (#367) (11 hours ago) [mihirpat1]
* 8edfece - Adding electrical for 800G and 100G (#365) (11 hours ago) [mihirpat1]
* 8a1debf - SFF-8472: Fix tx_disable_channel to avoid write to read-only bit (#364) (11 hours ago) [mihirpat1]
* 223a231 - Update host electrical interface for 2x400G breakout cable (#363) (11 hours ago) [mihirpat1]
* baabd8f - fix get module hardware minor revision (#361) (11 hours ago) [Qingxiao Ren]
* 2ebabf5 - Prevent VDM dictionary related KeyError when a transceiver module is pulled while a bulk get method is interrogating said module (#360) (11 hours ago) [snider-nokia]
* 1498ed6 - [CMIS] Add API to get module power up duration (#354) (11 hours ago) [ChiouRung Haung]
* 1cae718 - Modify get_host_lane_assignment_option to return value based on application id (#352) (11 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
Backport of #15961
Why I did it
Added the fwtrace config files in order to be able to call mlxstrace utility during show techsupport dump.
Work item tracking
Microsoft ADO (number only):
How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.
How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.
This is to backport #16096
Why I did it
SONiC changes:
Support Spectrum4 ASIC FW binary building.
Support new SDK sx-obj-desc lib building since new SAI need it.
Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2211.25.1.0
SDK/FW bug fixes
In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features
On SN2700 all ports can support y cable by credo
SAI bug Fixes
When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features
Port init profile
Dual ToR Active-Standby | Additional MAC support
Work item tracking
Microsoft ADO (number only):
How I did it
Update SDK/FW/SAI make files
How to verify it
Run full sonic-mgmt regression on Mellanox platform
#### Why I did it
src/sonic-utilities
```
* d69432d1 - (HEAD -> 202211, origin/202211) [202211][db_migrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->2211 upgrade + fast-reboot. Add UT. (#2838) (34 hours ago) [Vadym Hlushko]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 859bd678 - (HEAD -> 202211, origin/202211) Fix error in peer response time when headroom is calculated for 800G (#2860) (2 days ago) [Stephen Sun]
* 5f294cf1 - [Dynamic Buffer][Mellanox] Skip PGs in pending deleting set while checking accumulative headroom of a port (#2871) (2 days ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 2da4286 - (HEAD -> 202211, origin/202211) Add new SSD type support (#390) (14 hours ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 46b32daa - (HEAD -> 202211, origin/202211) [kdump] Fix API to read the current running image (#2217) (14 hours ago) [rajendra-dendukuri]
```
#### How I did it
#### How to verify it
#### Description for the changelog