Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983
While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
#### Why I did it
src/sonic-linux-kernel
```
* 342f6c3 - (HEAD -> 202311, origin/202311) [kconfig] Set default SATA Link Power Management policy (#363) (4 hours ago) [Volodymyr Samotiy]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 83a8c7a - (HEAD -> 202311, origin/202311) Fix issue: QSFP module with id 0x0d can be parsed using 8636 (#412) (4 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE
- How I did it
Initialized empty GRUB environment file after ONiE installation
- How to verify it
Install image from ONiE
Run BIOS firmware upgrade
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
If a PSU is not present, there could be error log while restarting psud or thermalctld:
Jan 8 17:15:52.689616 sonic ERR pmon#psud: Thermal sysfs /run/hw-management/thermal/psu2_temp1_max does not exist
Jan 8 17:15:57.747723 sonic ERR pmon#thermalctld: Thermal sysfs /run/hw-management/thermal/psu2_temp1 does not exist
- How I did it
if a PSU is not present, we should not check the PSU temperature sysfs.
### Why I did it
Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test.
### How I did it
Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address.
#### How to verify it
Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.
#### Why I did it
* Improved switch init time
### How I did it
* Replaced: `sonic-cfggen` -> `sonic-db-cli`
* Aggregated template list for `sonic-cfggen`
#### How to verify it
1. Run `warm-reboot`
#### Why I did it
src/sonic-utilities
```
* 9c1d489c - (HEAD -> 202311, origin/202311) Fix database initialization for db_migrator (#3100) (10 hours ago) [ganglv]
* e9ae14d2 - Support golden config in db migrator (#3076) (16 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fix the fsck check which is not working. Potentially fixes#16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides
Microsoft ADO: 26098631
Why I did it
Align the keywords to make qos configuration take effect
Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI
How to verify it
reload minigraph and check the qos configuration
* Reserve tcp port for telemetry and gnmi
* Use ip_local_port_range instead
* Fix sysctl config
Co-authored-by: ganglv <88995770+ganglyu@users.noreply.github.com>
#### Why I did it
src/sonic-utilities
```
* e70b0546 - (HEAD -> 202311, origin/202311) [202311] Revert bgp suppress fib pending (#3109) (9 hours ago) [Stepan Blyshchak]
```
#### How I did it
#### How to verify it
#### Description for the changelog
src/sonic-snmpagent
* 03e8bcd - (HEAD -> 202311, origin/202311) Fix SNMP dropping some of the queue counter when create_only_config_db_buffers is set to true (#303) (10 hours ago) [DavidZagury]
* fix hw_reset low polarity (reverse values)
* move seek to beginning of sysfs fd before reading to resolve power_good
sysfs returns empty upon plug out cable
src/sonic-utilities
* 7a242eeb - (HEAD -> 202311, origin/202311) [202311] Support reading/writing module EEPROM data by page and offset (#3008) (#3073) (2 days ago) [Junchao-Mellanox]
* cb0fd428 - [202311] Collect module EEPROM data in dump (#3009) (#3124) (3 days ago) [Junchao-Mellanox]
- Why I did it
To add new SKU Mellanox-SN4700-O8V48 with following requirements:
- How I did it
Create new SKU files based on the below definition:
* Port Mapping: 1-12 2x200G, 13-20 1x400G, 21-32 2x200G
T0 topology: 48x200G Downlinks 8x400G uplinks.
Length of downlink: 5m
Length of uplink: 40m
* Auto-negotiation enable/disable: Yes
* FEC mode: RS
* Shared headroom: Enabled
* Shared headroom pool factor: 2
* Warmboot enabled: yes
- How to verify it
SONiC build with new SKU finish init, all ports up, qos tests suite from sonic-mgmt
Co-authored-by: DavidZagury <32644413+DavidZagury@users.noreply.github.com>
- Why I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization
- How I did it
1. Thermal updater should wait more time for module to be initialized
2. sfp should get temperature threshold from EEPROM because SDK sysfs is not yet supported
3. Rename sfp function to fix typo
4. sfp.get_presence should return False if module is under initialization
- How to verify it
Manual test
Unit test
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.
- How I did it
adds a wait to make sure watchdog has been fully initialized.
- How to verify it
Manual test
sonic regression
Currently, whenever isc-dhcp-relay forwards a packet upstream,
internally, it will try to send it on a "fallback" interface. My
understanding is that this isn't meant to be a real interface, but
instead is basically saying to use Linux's regular routing stack to
route the packet appropriately (rather than having isc-dhcp-relay
specify specifically which interface to use).
The problem is that on systems with a weak CPU, a large number of
interfaces, and many upstream servers specified, this can introduce a
noticeable delay in packets getting sent. The delay comes from trying to
get the ifindex of the fallback interface. In one test case, it got to
the point that only 2 packets could be processed per second. Because of
this, dhcrelay will easily get backlogged and likely get to a point
where packets get dropped in the kernel.
Fix this by adding a check saying if we're using the fallback interface,
then don't try to get the ifindex of this interface. We're never going
to have an interface named this in SONiC.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.comCloses#17345
This W/A was proposed by Nvidia FRR team before the long term solution is ready.
Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it
- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way
- How to verify it
Manual test
sonic-mgmt platform test
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode
- How I did it
wait hw-management ready
wait SDK sysfs nodes ready
- How to verify it
manual test
unit test
sonic-mgmt regression
#### Why I did it
src/sonic-linux-kernel
```
* 46db038 - (HEAD -> 202311, origin/202311) Intgerate HW-MGMT 7.0030.2008 Changes (#361) (#372) (9 hours ago) [Kebo Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/dhcpmon
```
* 2443073 - (HEAD -> 202311, origin/202311) [counter] Clear counter table when dhcpmon init (#14) (#16) (2 days ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-utilities
```
* 72b6c04c - (HEAD -> 202311, origin/202311) Support disable/enable syslog rate limit feature (#3072) (2 days ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Enable Yang model for BGP_BBR config entry.
{
"BGP_BBR": {
"all": {
"status": "enabled"/"disabled"
}
}
}
Work item tracking
Microsoft ADO (number only): 25988660
How I did it
Add yang model and ut for BGP_BBR.
How to verify it
Use GCU cmd to change bbr status.
Create following json patch: disable_bbr.json-patch
[
{
"op": "replace",
"path": "/BGP_BBR/all/status",
"value": "disabled"
}
]
Run sudo config apply-patch ./disable_bbr.json-patch cmd on dut. Success.
- Why I did it
Optimize syslog rate limit feature for fast and warm boot
- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
- How to verify it
Manual test
Regression test
These changes, in conjunction with NDK version >= 22.9.17 address the thermal logging issues discussed at Nokia-ION/ndk#27. While the changes contained at this PR do not require coupling to NDK version >= 22.9.17, thermal logging enhancements will not be available without updated NDK >= 22.9.17. Thus, coupling with NDK >=22.9.17 is preferred and recommended.
Why I did it
To address thermal logging deficiencies.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
The following changes are included:
Threshold configuration values are provided in the associated device data .json files. There is also a change included to better handle the condition where an SFP module read fails.
Modify the module.py reboot to support reboot linecard from Supervisor
- Modify reboot to call _reboot_imm for single IMM card reboot
- Add log to the ndk_cmd to log the operation of "reboot-linecard" and "shutdown/satrtup the sfm"
Add new nokia_cmd set command and modify show ndk-status output
- Add a new function reboot_imm() to nokia_common.py to support reboot a single IMM slot from CPM
- Added new command: nokia_cmd set reboot-linecard <slot> [forece] for CPM
- Append a new column "RebootStatus" at the end of output of "nokia_cmd show ndk-status"
- Provide ability for IMM to disable all transceiver module TX at reboot time
- Remove defunct xcvr-resync service
Why I did it
When Supervisor card is rebooted by using PMON API, it takes about 90 seconds to trigger the shutdown in down path. At this time linecards have been up. This delays linecards database initialization which is trying to PING/PONG the database-chassis. To address this issue, we modified the NDK to use the system call with "sudo reboot" when the request is from PMON API on Supervisor case. The NDK version is 22.9.20 and greater. This new NDK requires this modifcaiton of platform_reboot to work with.
Work item tracking
Microsoft ADO (number only): 26365734
How I did it
Modify the platform_reboot In Supervisor not to reboot all IMMs since it has been done in the function reboot() in module.py. Also handle the reboot-cause.txt for on the Supervisor when the reboot is request from PMON API.
Modify the Nokia platform specific platform_reboot in linecard to disable all SPFs.
This PR works with NDK version 22.9.20 and above
Signed-off-by: mlok <marty.lok@nokia.com>
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':
Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")
- How I did it
Add lock for creating SFP object
- How to verify it
Unit test
Manual Test
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.
- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically
- How to verify it
Manual test
New Unit tests
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature
- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.
- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
Co-authored-by: dbarashinvd <105214075+dbarashinvd@users.noreply.github.com>