Commit Graph

8354 Commits

Author SHA1 Message Date
Junchao-Mellanox
7d388cd0e6
[Mellanox] wait until hw-management watchdog files ready (#17618)
- Why I did it
watchdog-control service always disarm watchdog during system startup stage. It could be the case that watchdog is not fully initialized while the watchdog-control service is accessing it. This PR adds a wait to make sure watchdog has been fully initialized.

- How I did it
adds a wait to make sure watchdog has been fully initialized.

- How to verify it
Manual test
sonic regression
2023-12-26 18:27:18 +02:00
davidpil2002
80f2f6bce1
password-hardening: Add support to disable expiration date like in Linux (PAM) (#17426)
- Why I did it
Enhance the feature to support disabling password hardening as Linux support.
-1: expiration will never occur
0: expiration will expired immediately

Opened bug:
#17427

- How I did it
Added the -1 value to be supported in hostcfgd and this value will propagate to the relevant Linux files

- How to verify it
Pls see the details in the bug description that link attached above
2023-12-25 11:14:17 +02:00
Vivek
18dd948e60
Fix kdump-tools to not overwrite MODULES conf to dep (#17490)
- Why I did it
Fix kdump-tools to not overwrite MODULES conf to dep. Problem is seen if the build is failed and the build is retriggered immediately as part of retry mechanism
This command is failing during the second run

+ for kernel_release in $(ls $FILESYSTEM_ROOT/lib/modules/)
+ sudo LANG=C chroot ./fsroot-mellanox /etc/kernel/postinst.d/kdump-tools 6.1.0-11-2-amd64
+ clean_sys
https://github.com/sonic-net/sonic-buildimage/blob/master/files/build_templates/sonic_debian_extension.j2#L311

Community Issue: https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg515013.html

- How I did it
Add a patch to revert the override

- How to verify it
vkarri@482a053c44f4:/sonic$ sudo unsquashfs -d ./fsroot-mellanox target/sonic-mellanox.bin__mellanox__rfs.squashfs

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-12-25 11:13:19 +02:00
bktsim
c0bc1d9753
[Arista] Remove aggregate port config files for multi-asic devices (#16923)
An aggregate port_config.ini file for Arista multi-asic devices was first introduced by mistake. This PR cleans up these unnecessary files.
2023-12-22 17:10:41 -08:00
Xichen96
08666100fc
[dhcp_server] add config dhcp server del (#17603)
* add config dhcp server del
2023-12-22 09:07:24 -08:00
mssonicbld
d4a78665ee
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17596) 2023-12-22 06:00:06 +08:00
mssonicbld
d892626253
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#17591) 2023-12-21 15:47:39 +08:00
mssonicbld
d7a422e57a
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17588) 2023-12-21 15:26:40 +08:00
Xichen96
13a16cf87f
[dhcp_server] add config dhcp_server add (#17489)
* dhcp_server add
* add test dup gw nm
2023-12-20 09:09:46 -08:00
Xichen96
1e92ba24ec
[dhcp_server] add show dhcp server info (#17468)
* add show dhcp server info
2023-12-20 09:07:32 -08:00
mssonicbld
86fb9eaf06
[submodule] Update submodule dhcprelay to the latest HEAD automatically (#17572)
#### Why I did it
src/dhcprelay
```
* 5ae186f - (HEAD -> master, origin/master, origin/HEAD) [counter] Clear counter table when init (#45) (10 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-20 16:32:37 +08:00
byu343
2559d7e541
[Arista] Use port_config.ini for Arista-7050QX-32S-S4Q31 (#17253)
This change of removing hwsku.json is to correct the port index for
sfp ports (Ethernet0, Ethernet1, Ethernet2, Ethernet3) by using
port_config.ini, which should be '1, 2, 3, 4'. We could not do it
with hwsku.json, as it is defined as '5, 5, 5, 5' by platform.json
for the breakout_mode 1x40G[10G].
2023-12-20 15:29:43 +08:00
Junchao-Mellanox
f3f2972512
Optimize syslog rate limit feature for fast and warm boot (#17458)
- Why I did it
Optimize syslog rate limit feature for fast and warm boot

- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)

- How to verify it
Manual test
Regression test
2023-12-20 09:12:03 +02:00
Oleksandr Ivantsiv
885f1629dd
[yang][smartswitch] Add YANG model for MID_PLANE_BRIDGE and DPU tables. (#17311)
- Why I did it
Add the YANG model according to Smart Switch IP address assignment HDL.

- How I did it
Implement new YANG model containers.

- How to verify it
Run YANG model unit tests. The changes add new unit tests to cover new functionality.
2023-12-20 09:05:11 +02:00
Prince George
30ff77350f
Fix the fsck script that does filesystem repair (#17424)
Fix the fsck check which is not working. Potentially fixes #16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides

Microsoft ADO: 26098631
2023-12-19 17:51:49 -08:00
mssonicbld
050420f444
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17559)
#### Why I did it
src/sonic-platform-common
```
* c82ae54 - (HEAD -> master, origin/master, origin/HEAD) Implementing set_optoe_write_timeout API (#422) (8 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:58 +08:00
mssonicbld
f804a6ec5a
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17560)
#### Why I did it
src/sonic-sairedis
```
* e849160 - (HEAD -> master, origin/master, origin/HEAD) [vslib] add support for ACL table available entry/counter attributes (#1333) (9 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:54 +08:00
mssonicbld
ec92a6d421
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17561)
#### Why I did it
src/sonic-swss
```
* 5f367ebb - (HEAD -> master, origin/master, origin/HEAD) [dash] reduce the memory used by DASH ACL rules (#2984) (8 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:50 +08:00
mssonicbld
2ba3c90ad4
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17516) 2023-12-17 15:46:25 +08:00
Junchao-Mellanox
d8a1ffbace
[Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2023-12-17 08:02:47 +02:00
mssonicbld
37a9c25cfb
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17535) 2023-12-16 15:51:42 +08:00
mssonicbld
e0dc6def82
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17533) 2023-12-16 15:49:20 +08:00
mssonicbld
9664c201f6
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17532) 2023-12-16 15:34:43 +08:00
mssonicbld
ff1c8e0c24
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17534) 2023-12-16 15:30:37 +08:00
jingwenxie
ad90ad9fcd
Update TELEMETRY_CLIENT YANG model (#16861)
### Why I did it
Github issue: https://github.com/sonic-net/sonic-buildimage/issues/16356. The YANG definition breaks GCU feature.

We can either update sonic_yang and GCU's search algorithm to enable the same key count case or simply update YANG model to solve the issue.

The pros for update YANG model are it could solve the issue directly and we don't need to handle the complicate search algorithm in sonic_yang and GCU. This is the only YANG model that has this issue.

### How I did it
Combine two list into one. The previous YANG validation unit tests are still applicable.
#### How to verify it
Unit test and E2E test
2023-12-15 17:04:55 -08:00
Yaqiang Zhu
728df4e89d
[dhcp_relay] Optimize j2 file in dhcp_relay container (#17506) 2023-12-15 15:47:40 -08:00
spilkey-cisco
69ad1ed41a
Fix system-health hardware_checker to consume fan tolerance details (#16689)
Why I did it

Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:

ADO: 25279165

root@sonic/# show system-health summary
System status summary

  System status LED  red_blink
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: Failed to get speed tolerance for fantray5.fan1
	     Failed to get speed tolerance for fantray5.fan0
	     Failed to get speed tolerance for fantray4.fan1
	     Failed to get speed tolerance for fantray4.fan0
	     Failed to get speed tolerance for fantray3.fan1
	     Failed to get speed tolerance for fantray3.fan0
	     Failed to get speed tolerance for fantray2.fan1
	     Failed to get speed tolerance for fantray2.fan0
	     Failed to get speed tolerance for fantray1.fan1
	     Failed to get speed tolerance for fantray1.fan0
	     Failed to get speed tolerance for fantray0.fan1
	     Failed to get speed tolerance for fantray0.fan0
	     Failed to get speed tolerance for PSU1.fan0
	     Failed to get speed tolerance for PSU0.fan0

How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.

How to verify it
root@sonic:/# show system-health summary
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK
2023-12-15 15:33:20 -08:00
Liu Shilong
979516633d
[ci] Support pensando platform build in pipeline. (#17512)
Why I did it
Update pipeline file to support pensando's build.

Work item tracking
Microsoft ADO (number only): 26087700
How I did it
How to verify it
2023-12-15 16:52:51 +08:00
Liu Shilong
2532661cd9
[ci] Enable sonic-restapi build in PR validation. (#17397)
Why I did it
Enable sonic-restapi build in two platform to avoid build break on restapi target.

Work item tracking
Microsoft ADO (number only): 26048426
How I did it
How to verify it
2023-12-15 16:25:26 +08:00
mssonicbld
73f6e5895a
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17499) 2023-12-15 15:31:28 +08:00
mssonicbld
be99434991
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17501)
#### Why I did it
src/sonic-swss
```
* ff524e6d - (HEAD -> master, origin/master, origin/HEAD) [dash] add a retry for an ACL rule creation if a tag is not created yet (#2972) (7 hours ago) [Yakiv Huryk]
* 620db3da - [ci] Allow partially success build artifact in PR checker pipeline. #2986 (3 days ago) [Liu Shilong]
* d357e6f1 - [copporch] Add safeguard during policer attribute update (#2977) (4 days ago) [Vivek]
* cb460394 - [fpmsyncd][WR] Relax the static schema constraint for ROUTE_TABLE (#2981) (5 days ago) [Vivek]
* a1ce21f6 - Change base directory referenced in coverage.xml (#2976) (6 days ago) [Lawrence Lee]
* 920959cf - [Dash] [UT] Add ZMQ test case for dash (#2967) (6 days ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-15 06:32:51 +08:00
Sudharsan Dhamal Gopalarathnam
f3f507826b
[FRR] Fix zebra memory leak when bgp fib suppress pending is enabled (#17484)
Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983

While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
2023-12-14 09:13:20 -08:00
Ze Gan
dac2ba6e1b
[Azp]: Add dash-api dependencies on building Azp ubuntu20.04 (#17507)
Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-12-14 08:59:16 -08:00
mssonicbld
d6f6bbfc5d
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#17436)
#### Why I did it
src/sonic-gnmi
```
* 88e82d4 - (HEAD -> master, origin/master, origin/HEAD) Replace PFC_WD_TABLE with PFC_WD (#173) (8 days ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-14 18:35:44 +08:00
Junchao-Mellanox
c1cb292310
[Mellanox] implement platform wait in python code (#17398)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2023-12-14 12:04:24 +02:00
Junchao-Mellanox
f373a16e95
[Mellanox] Fix race condition while creating SFP (#17441)
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':

Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")

- How I did it
Add lock for creating SFP object

- How to verify it
Unit test
Manual Test
2023-12-14 12:01:11 +02:00
mssonicbld
da3e7cbbba
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17476)
#### Why I did it
src/linkmgrd
```
* 79c3872 - (HEAD -> master, origin/master, origin/HEAD) [active-standby] Fix `show mux status` inconsistency introduced by orchagent rollback  (#225) (24 hours ago) [Jing Zhang]
* ba913c0 - [warmboot] use config_db connector to update mux mode config instead of CLI (#223) (2 days ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-14 16:34:10 +08:00
mssonicbld
e59ac879e6
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#17497) 2023-12-14 16:15:27 +08:00
mssonicbld
953d3dc175
[submodule] Update submodule sonic-dash-api to the latest HEAD automatically (#17503) 2023-12-14 15:47:12 +08:00
mssonicbld
c7e7dffb6e
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17500) 2023-12-14 15:42:54 +08:00
mssonicbld
67c0543127
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#17498) 2023-12-14 15:42:08 +08:00
mssonicbld
fa5829bca8
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17502) 2023-12-14 15:38:45 +08:00
Ze Gan
b21f33b8b1
[Azp]: Fix azp on building ubuntu20.04 and sonic-mgmt (#17439)
The Azp failed on ubuntu20.04 and sonic-mgmt building due to sonic-dash-api updating.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-12-13 22:49:04 -08:00
Nazarii Hnydyn
6d043a25bd
[installer] Create a blank grubenv if doesn't exist. (#17414)
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE

- How I did it
Initialized empty GRUB environment file after ONiE installation

- How to verify it
Install image from ONiE
Run BIOS firmware upgrade

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-14 08:41:12 +02:00
Junhua Zhai
53be9de743
Fix syncd_request_shutdown coredump in config reload on KVM sonic (#17486)
The issue is related to #16812. Process syncd does not run in the container gbsyncd on kvm sonic with default hwsku.

Microsoft ADO : 26151608

How I did it
If syncd has not run in container gbsyncd, it is not needed to trigger graceful shudown of syncd.

How to verify it
None of syncd_request_shutdown coredump in config reload on KVM sonic
2023-12-13 17:37:44 -08:00
zitingguo-ms
6a9ec987b5
change branch name (#17267)
Why I did it
Upgrade xgs SAI to 10.1 version.

Work item tracking
Microsoft ADO (number only): 25931321
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qualification on 7050cx3/7260cx3:

7050cx3:
https://dev.azure.com/mssonic/internal/_build/results?buildId=425450&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=425449&view=results
7260cx3: https://elastictest.org/scheduler/testplan/656f2b2b617fb27e41557494?leftSideViewMode=detail&prop=status&order=ascending
2023-12-14 09:37:35 +08:00
Junchao-Mellanox
1b84f3daa5
[Mellanox] update asic and module temperature in a thread for CMIS management (#16955)
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
2023-12-13 14:19:44 +02:00
Junchao-Mellanox
0d62cf0e92
[Mellanox] Remove EEPROM write limitation if it is software control (#17030)
- Why I did it
When module is under software control (CMIS host management enabled), EEPROM should be controlled by software and there should be no limitation for any write operation.

- How I did it
Remove EEPROM write limitation if a module is under software control

- How to verify it
Manual test
UT
2023-12-13 14:16:40 +02:00
Sudharsan Dhamal Gopalarathnam
dd39dd0e03
[Mellanox] Update SAI to 2311.26.0.28, SDK/FW to 4.6.2134/2012.2134 (#17481)
- Why I did it
Update SAI version to SAIBuild2311.26.0.28

Fixed issues
1. Traffic with unicast destination ip and multicast destination mac wasn't properly dropped
2. When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
3. Optional feature of Port IP counters (SAI_PORT_STAT_IP*) , enabled by SAI XML per-port-ip-counter-enabled config node, wasn't initialized properly.
4. Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
5. The default value for port FEC during switch init for Spectrum3 was initialized as 'auto' and not aligned to SAI header default 'none'. Note if setups has invalid configuration and relied previously on auto, now it might be necessary for the user to provide explicit valid value for SAI_PORT_ATTR_FEC_MODE

Update SDK/FW version to 4.6.2134/2012.2134
Fixed issues:
1. Updated SN3700C to enable limit to 100G speed.
2. Recovering from Low power mode might ends with port down.

- How I did it
Updating the versions in makefile

- How to verify it
Confirm issues fixed and run sonic-mgmt tests
2023-12-13 12:48:49 +02:00
Zain Budhwani
f82980784d
Change leaf value of used_cnt of sonic-events-swss:chk_crm_threshold (#17430)
### Why I did it

Current YANG model of sonic-events-swss:chk_crm_threshold has the type uint8 for leaf used_cnt which is too small of a range to hold values of used_cnt which can greatly exceed that. Updating leaf type of used_cnt and free_cnt to match defined definition.

Changed to uint32 as per defined here: https://github.com/sonic-net/sonic-swss/blob/master/orchagent/crmorch.h#L99

##### Work item tracking
- Microsoft ADO **(number only)**:26091912

#### How I did it

Update leaf value

#### How to verify it

UT and sonic-mgmt PR checker
2023-12-12 11:35:46 -08:00