* [docker_image_ctl.j2]: swss docker initialization improvements
This commit attempts to address the following:
* Make sure swss container is indeed up and running before running any commands
on it. In case where swss container is not fully up when swss.sh attempts to
create swss:/ready file using "docker exec swss$DEV touch", the command can
fail silently and can cause swssconfig to wait forever leading to missing IP
decap configuration among other things. Add a wait so that docker commands
are run only after swss container status is "Running"
* Add a log when swss:/ready file is created or if the file creation fails so
that it becomes easier to debug such scenarios in the future
* [docker_image_ctl.j2]: Use swss$DEV to accommodate multi ASIC platforms as well
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
* [image_config]: Update DHCP rate-limit for mgmt TOR devices
Change DHCP rate limit(queue4,group3) in SONiC copp configuration to 300 PPS
for mgmt TORs while keeping the rate limit at 100 PPS for other topologies.
Why I did it:
Some mgmt TORs based on Marvell ASIC do not support 100 PPS CIR, so that led
to these devices silently dropping DHCP packets.
Microsoft ADO: **25820076**
How to verify it:
Send DHCP broadcast packets to an M0 DUT and verify that they are trapped to
CPU at 300 PPS. On non-mgmt devices, the packets should be trapped at CIR of
100 PPS. Also ran sonic-mgmt dhcp_relay test and confirmed that it passes.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
PROXY variables are not available to sudo users during docker build
This patch fixes below error during builds using proxy:
Step 57/63 : RUN sudo apt-get install python3-m2crypto
---> Running in ebfa797ebcf8
Reading package lists...
Building dependency tree...
Reading state information...
Suggested packages:
python-m2crypto-doc
The following NEW packages will be installed:
python3-m2crypto
0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
Need to get 169 kB of archives.
After this operation, 725 kB of additional disk space will be used.
Ign:1 http://deb.debian.org/debian bookworm/main armhf python3-m2crypto armhf 0.38.0-4+b1
Ign:1 http://deb.debian.org/debian bookworm/main armhf python3-m2crypto armhf 0.38.0-4+b1
Ign:1 http://deb.debian.org/debian bookworm/main armhf python3-m2crypto armhf 0.38.0-4+b1
Err:1 http://deb.debian.org/debian bookworm/main armhf python3-m2crypto armhf 0.38.0-4+b1
Could not connect to debian.map.fastlydns.net:80 (146.75.78.132), connection timed out Unable to connect to deb.debian.org:http:
E: Failed to fetch http://deb.debian.org/debian/pool/main/m/m2crypto/python3-m2crypto_0.38.0-4%2bb1_armhf.deb Could not connect to debian.map.fastlydns.net:80 (146.75.78.132), connection timed out Unable to connect to deb.debian.org:http:
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
### Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.
### How I did it
Declare a new field, `has_per_dpu_scope`, in the config_db for database feature.
#### How to verify it
Check Azp
### Why I did it
Revise DPU's database_global.json schema to achieve a more general design
### How I did it
1. Remove databse_type
2. Add a new field databse_name
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.
- How I did it
adds a wait to make sure watchdog has been fully initialized.
- How to verify it
Manual test
sonic regression
- Why I did it
Enhance the feature to support disabling password hardening as Linux support.
-1: expiration will never occur
0: expiration will expired immediately
Opened bug:
#17427
- How I did it
Added the -1 value to be supported in hostcfgd and this value will propagate to the relevant Linux files
- How to verify it
Pls see the details in the bug description that link attached above
- Why I did it
Fix kdump-tools to not overwrite MODULES conf to dep. Problem is seen if the build is failed and the build is retriggered immediately as part of retry mechanism
This command is failing during the second run
+ for kernel_release in $(ls $FILESYSTEM_ROOT/lib/modules/)
+ sudo LANG=C chroot ./fsroot-mellanox /etc/kernel/postinst.d/kdump-tools 6.1.0-11-2-amd64
+ clean_sys
https://github.com/sonic-net/sonic-buildimage/blob/master/files/build_templates/sonic_debian_extension.j2#L311
Community Issue: https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg515013.html
- How I did it
Add a patch to revert the override
- How to verify it
vkarri@482a053c44f4:/sonic$ sudo unsquashfs -d ./fsroot-mellanox target/sonic-mellanox.bin__mellanox__rfs.squashfs
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
#### Why I did it
src/dhcprelay
```
* 5ae186f - (HEAD -> master, origin/master, origin/HEAD) [counter] Clear counter table when init (#45) (10 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
This change of removing hwsku.json is to correct the port index for
sfp ports (Ethernet0, Ethernet1, Ethernet2, Ethernet3) by using
port_config.ini, which should be '1, 2, 3, 4'. We could not do it
with hwsku.json, as it is defined as '5, 5, 5, 5' by platform.json
for the breakout_mode 1x40G[10G].
- Why I did it
Optimize syslog rate limit feature for fast and warm boot
- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
- How to verify it
Manual test
Regression test
- Why I did it
Add the YANG model according to Smart Switch IP address assignment HDL.
- How I did it
Implement new YANG model containers.
- How to verify it
Run YANG model unit tests. The changes add new unit tests to cover new functionality.
Fix the fsck check which is not working. Potentially fixes#16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides
Microsoft ADO: 26098631
#### Why I did it
src/sonic-platform-common
```
* c82ae54 - (HEAD -> master, origin/master, origin/HEAD) Implementing set_optoe_write_timeout API (#422) (8 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* e849160 - (HEAD -> master, origin/master, origin/HEAD) [vslib] add support for ACL table available entry/counter attributes (#1333) (9 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 5f367ebb - (HEAD -> master, origin/master, origin/HEAD) [dash] reduce the memory used by DASH ACL rules (#2984) (8 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it
- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way
- How to verify it
Manual test
sonic-mgmt platform test
### Why I did it
Github issue: https://github.com/sonic-net/sonic-buildimage/issues/16356. The YANG definition breaks GCU feature.
We can either update sonic_yang and GCU's search algorithm to enable the same key count case or simply update YANG model to solve the issue.
The pros for update YANG model are it could solve the issue directly and we don't need to handle the complicate search algorithm in sonic_yang and GCU. This is the only YANG model that has this issue.
### How I did it
Combine two list into one. The previous YANG validation unit tests are still applicable.
#### How to verify it
Unit test and E2E test
Why I did it
Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:
ADO: 25279165
root@sonic/# show system-health summary
System status summary
System status LED red_blink
Services:
Status: OK
Hardware:
Status: Not OK
Reasons: Failed to get speed tolerance for fantray5.fan1
Failed to get speed tolerance for fantray5.fan0
Failed to get speed tolerance for fantray4.fan1
Failed to get speed tolerance for fantray4.fan0
Failed to get speed tolerance for fantray3.fan1
Failed to get speed tolerance for fantray3.fan0
Failed to get speed tolerance for fantray2.fan1
Failed to get speed tolerance for fantray2.fan0
Failed to get speed tolerance for fantray1.fan1
Failed to get speed tolerance for fantray1.fan0
Failed to get speed tolerance for fantray0.fan1
Failed to get speed tolerance for fantray0.fan0
Failed to get speed tolerance for PSU1.fan0
Failed to get speed tolerance for PSU0.fan0
How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.
How to verify it
root@sonic:/# show system-health summary
System status summary
System status LED green
Services:
Status: OK
Hardware:
Status: OK
Why I did it
Enable sonic-restapi build in two platform to avoid build break on restapi target.
Work item tracking
Microsoft ADO (number only): 26048426
How I did it
How to verify it
#### Why I did it
src/sonic-swss
```
* ff524e6d - (HEAD -> master, origin/master, origin/HEAD) [dash] add a retry for an ACL rule creation if a tag is not created yet (#2972) (7 hours ago) [Yakiv Huryk]
* 620db3da - [ci] Allow partially success build artifact in PR checker pipeline. #2986 (3 days ago) [Liu Shilong]
* d357e6f1 - [copporch] Add safeguard during policer attribute update (#2977) (4 days ago) [Vivek]
* cb460394 - [fpmsyncd][WR] Relax the static schema constraint for ROUTE_TABLE (#2981) (5 days ago) [Vivek]
* a1ce21f6 - Change base directory referenced in coverage.xml (#2976) (6 days ago) [Lawrence Lee]
* 920959cf - [Dash] [UT] Add ZMQ test case for dash (#2967) (6 days ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983
While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
#### Why I did it
src/sonic-gnmi
```
* 88e82d4 - (HEAD -> master, origin/master, origin/HEAD) Replace PFC_WD_TABLE with PFC_WD (#173) (8 days ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog