Why I did it
Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state.
There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state.
To prevent this from happening the CPU should be forced to remain in C1 state.
How I did it
Generalize the cstate forcing to C1 to all Arista products.
This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs.
Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver.
How to verify it
Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs
Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs
Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408.
Since we're building openssh with some patches, we need to update our version as well.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-linux-kernel
```
* 9534615 - (HEAD -> 202211, origin/202211) arm64: ac5: Fix watchdog timeleft (#334) (5 days ago) [pavannaregundi]
* 70c4df8 - [marvell-arm64]: Add support for 98DX35xx and 98CX85xx platform (#311) (6 days ago) [pavannaregundi]
* aab079e - [Mellanox] Upstream kernel patches with HW-MGMT 7.0030.1011 (#327) (4 weeks ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* abb22d2 - (HEAD -> 202211, origin/202211) [warmboot] config all interfaces back to `auto` if reconciliation times out (#220) (7 days ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 9647b81f - (HEAD -> 202211, origin/202211) [muxorch] Reorder the neighbor disable operations (#2917) (12 hours ago) [Longxiang Lyu]
* 30cea968 - Support type7 encoded CAK key for macsec in config_db (#2892) (5 days ago) [judyjoseph]
* 8d76a4e7 - [202211][ppi]: General code cleanup: remove unused methods. (#2868) (3 weeks ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* [Mellanox] Update HW-MGMT package to new version V.7.0030.1010
Signed-off-by: Kebo Liu <kebol@mellanox.com>
* Update hw-mgmt version to 7.0030.1011
Signed-off-by: Kebo Liu <kebol@nvidia.com>
---------
Signed-off-by: Kebo Liu <kebol@mellanox.com>
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.
How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.
How to verify it
Manual test on master/202211/202205
Why I did it
When SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT changed, almost all dockers need to be built again.
But currently it will be loaded by cache.
Work item tracking
Microsoft ADO (number only): 25123348
How I did it
Add $(DOCKER)_FILES into dependencies.
How to verify it
* [swss] Chassis db clean up optimization and bug fixes
This commit includes the following changes:
- Fix for regression failure due to error in finding CHASSIS_APP_DB in
pizzabox (#PR 16451)
- After attempting to delete the system neighbor entries from
chassis db, before starting clearing the system interface entries,
wait for sometime only if some system neighbors were deleted.
If there are no system neighbors entries deleted for the asic coming up,
no need to wait.
- Similar changes for system lag delete. Before deleting the
system lag, wait for some time only if some system lag memebers were
deleted. If there are no system lag members deleted no need to wait.
- Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
is coming up, when system neigh entries are deleted from chassis ap
db (as part of chassis db clean up), there is no orchs/process running to
process the delete messages from chassis redis. Because of this, stale system
neigh are entries present in the local STATE_DB. The stale entries result in
creation of orphan (no corresponding data path/asic db entry) kernel neigh
entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
the local STATE_DB when sevice comes up.
Signed-off-by: vedganes <veda.ganesan@nokia.com>
* [swss] Chassis db clean up bug fixes review comment fix - 1
Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)
Signed-off-by: vedganes <veda.ganesan@nokia.com>
---------
Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)"
This reverts commit 803c71c86a.
* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487)"
This reverts commit 9864dfeaa1.
SAI bug Fixes
- When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
- Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable
- Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SDK/FW bug fixes
- When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
Bmc is a valid neighbor type in minigraph, however it was missing from the YANG model definition. Usually, the Bmc type device can be neighbor of BmcMgmtToRRouter. This PR is to introduce this type.
Why I did it
For some devices whose log folder size is larger than 200M, for example, 256M, the LOG_FILE_ROTATE_SIZE_KB should be 16M. and
THRESHOLD_KB=$((USABLE_SPACE_KB - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (VAR_LOG_SIZE_KB * 90 / 100) - RESERVED_SPACE_KB)) - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (256M * 90 / 100) - 4096)) - (8 * 16M * 2)))
the result would be a negative value
Work item tracking
Microsoft ADO (number only):
24524827
How I did it
Add a case for 400M, if the log folder size is between 200M and 400M, set the log file size to 2M
How to verify it
Do cmd "sudo logrotate -f /etc/logrotate.conf" on DUT which val/log folder size is 256M, and check the syslog.
Why I did it
This is a fix for PR [kernel] Change grub cmdline to set c-states to 0 for "Intel" CPUs by shlomibitton · Pull Request #6051 · sonic-net/sonic-buildimage (github.com)
The original PR will disable intel idle driver but it cannot limit the max c-state to 1 due to system will fall back to acpi idle driver.
Currently intel_idle.max_cstate=0 is already present, which will disable intel idle driver. With the added option, common idle driver will be disabled as well, so there will not be idle management. This is to prevent a bug that can be triggered by idle instruction on intel platform.
How I did it
Add the option to installer file beside intel_idle.max_cstate=0
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.
How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:
Why I did it
Dell S6100 Platform components needs to be updated.
How I did it
Modified platform.json to fix the issue.
How to verify it
Run sonic-mgmt component test and check whether it passes.
Why I did it
Few commands in multiasic platforms when run with the "sudo ip netns exec asic0 " option was taking like 15 mins to get the o/p. This behavior of sudo getting hung was seen by just doing this
jujoseph@svcstr-server-2:~ sudo ip netns exec asic0 bash
jujoseph@svcstr-server-2:~ sudo ls
deally sudo is not needed as we have /bin/ip netns identify present in /etc/sudoers file. Hence removing it
Why I did it
For security and consistency consideration, change the docker image from alpine to Debian in Makefile
Work item tracking
Microsoft ADO (number only): 23077660
How I did it
change the docker image from alpine to Debian in Makefile
Why I did it
Downgrade the symcrypt version, use the SymCrypt version v103.0.1 for certification.
Work item tracking
Microsoft ADO (number only): 24222567
How I did it
How to verify it
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back
- How I did it
Re-introduce the logic to run the db_migrator on non-first boots
- How to verify it
Verified reboot and warm-reboot cases
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
How I did it
Fix the regex for L4 port range in openconfig_acl.py.
How to verify it
Build image and install on Arista-720DT DUT, then try the repro steps in #16189 and confirmed the ACL rule be setup correctly:
#### Why I did it
src/sonic-sairedis
```
* 2ebbd48 - (HEAD -> 202211, origin/202211) [syncd] Add pre match logic for acl entry (#1240) (11 hours ago) [Kamil Cudnik]
* 1db8726 - Use SAI_STATUS_ITEM_NOT_FOUND when key not found (#1224) (11 hours ago) [Lawrence Lee]
* 9e4071b - [CI]: Fix collect log error in azp template. (#1282) (4 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* 10d7946 - (HEAD -> 202211, origin/202211) PATCH] net: allow user to set metric on default route learned via Router Advertisement (#326) (8 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
DEPENDS:
[202211][ppi]: Implement port bulk comparison logic (#2564) sonic-swss#2821
HLD: sonic-net/SONiC#1084
Why I did it
Enabled port late create on SN5600 switch boots up with no ports
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests fastboot
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62
- How I did it
Update the MFT tool make file
- How to verify it
Run full sonic-mgmt regression.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
src/sonic-platform-common
```
* 05cf5c1 - (HEAD -> 202211, origin/202211) Change Y cable simulator log level from error to warning due to false alarm (11 hours ago) [ShiyanWangMS]
* 35ea290 - Update CMIS api's rendering max-duration (#375) (11 hours ago) [rajann]
* 33bd498 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (11 hours ago) [mihirpat1]
* 2434362 - Render Media lane and Media assignment options info from Application Code (#368) (11 hours ago) [rajann]
* 862674b - Modify sfputil show fwversion to include build version for active/inactive FW version fields (#367) (11 hours ago) [mihirpat1]
* 8edfece - Adding electrical for 800G and 100G (#365) (11 hours ago) [mihirpat1]
* 8a1debf - SFF-8472: Fix tx_disable_channel to avoid write to read-only bit (#364) (11 hours ago) [mihirpat1]
* 223a231 - Update host electrical interface for 2x400G breakout cable (#363) (11 hours ago) [mihirpat1]
* baabd8f - fix get module hardware minor revision (#361) (11 hours ago) [Qingxiao Ren]
* 2ebabf5 - Prevent VDM dictionary related KeyError when a transceiver module is pulled while a bulk get method is interrogating said module (#360) (11 hours ago) [snider-nokia]
* 1498ed6 - [CMIS] Add API to get module power up duration (#354) (11 hours ago) [ChiouRung Haung]
* 1cae718 - Modify get_host_lane_assignment_option to return value based on application id (#352) (11 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot