Commit Graph

8640 Commits

Author SHA1 Message Date
Prince George
30ff77350f
Fix the fsck script that does filesystem repair (#17424)
Fix the fsck check which is not working. Potentially fixes #16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides

Microsoft ADO: 26098631
2023-12-19 17:51:49 -08:00
mssonicbld
050420f444
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17559)
#### Why I did it
src/sonic-platform-common
```
* c82ae54 - (HEAD -> master, origin/master, origin/HEAD) Implementing set_optoe_write_timeout API (#422) (8 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:58 +08:00
mssonicbld
f804a6ec5a
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17560)
#### Why I did it
src/sonic-sairedis
```
* e849160 - (HEAD -> master, origin/master, origin/HEAD) [vslib] add support for ACL table available entry/counter attributes (#1333) (9 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:54 +08:00
mssonicbld
ec92a6d421
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17561)
#### Why I did it
src/sonic-swss
```
* 5f367ebb - (HEAD -> master, origin/master, origin/HEAD) [dash] reduce the memory used by DASH ACL rules (#2984) (8 hours ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-19 16:33:50 +08:00
mssonicbld
2ba3c90ad4
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17516) 2023-12-17 15:46:25 +08:00
Junchao-Mellanox
d8a1ffbace
[Mellanox] implement sfp.reset for CMIS management (#16862)
- Why I did it
For CMIS host management module, we need a different implementation for sfp.reset. This PR is to implement it

- How I did it
For SW control modules, do reset from hw_reset
For FW control modules, do reset as the original way

- How to verify it
Manual test
sonic-mgmt platform test
2023-12-17 08:02:47 +02:00
mssonicbld
37a9c25cfb
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17535) 2023-12-16 15:51:42 +08:00
mssonicbld
e0dc6def82
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17533) 2023-12-16 15:49:20 +08:00
mssonicbld
9664c201f6
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17532) 2023-12-16 15:34:43 +08:00
mssonicbld
ff1c8e0c24
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17534) 2023-12-16 15:30:37 +08:00
jingwenxie
ad90ad9fcd
Update TELEMETRY_CLIENT YANG model (#16861)
### Why I did it
Github issue: https://github.com/sonic-net/sonic-buildimage/issues/16356. The YANG definition breaks GCU feature.

We can either update sonic_yang and GCU's search algorithm to enable the same key count case or simply update YANG model to solve the issue.

The pros for update YANG model are it could solve the issue directly and we don't need to handle the complicate search algorithm in sonic_yang and GCU. This is the only YANG model that has this issue.

### How I did it
Combine two list into one. The previous YANG validation unit tests are still applicable.
#### How to verify it
Unit test and E2E test
2023-12-15 17:04:55 -08:00
Yaqiang Zhu
728df4e89d
[dhcp_relay] Optimize j2 file in dhcp_relay container (#17506) 2023-12-15 15:47:40 -08:00
spilkey-cisco
69ad1ed41a
Fix system-health hardware_checker to consume fan tolerance details (#16689)
Why I did it

Fan tolerance checking is done through new APIs, is_under_speed and is_over_speed, which populate corresponding fields into the database. speed_tolerance is no longer used and was removed, but system-health was not updated and indicates failures:

ADO: 25279165

root@sonic/# show system-health summary
System status summary

  System status LED  red_blink
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: Failed to get speed tolerance for fantray5.fan1
	     Failed to get speed tolerance for fantray5.fan0
	     Failed to get speed tolerance for fantray4.fan1
	     Failed to get speed tolerance for fantray4.fan0
	     Failed to get speed tolerance for fantray3.fan1
	     Failed to get speed tolerance for fantray3.fan0
	     Failed to get speed tolerance for fantray2.fan1
	     Failed to get speed tolerance for fantray2.fan0
	     Failed to get speed tolerance for fantray1.fan1
	     Failed to get speed tolerance for fantray1.fan0
	     Failed to get speed tolerance for fantray0.fan1
	     Failed to get speed tolerance for fantray0.fan0
	     Failed to get speed tolerance for PSU1.fan0
	     Failed to get speed tolerance for PSU0.fan0

How I did it
Updated hardware_checker.py in system-health to consume new is_under_speed and is_over_speed database entries instead of speed_tolerance and hard-coded calculations.

How to verify it
root@sonic:/# show system-health summary
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK
2023-12-15 15:33:20 -08:00
Liu Shilong
979516633d
[ci] Support pensando platform build in pipeline. (#17512)
Why I did it
Update pipeline file to support pensando's build.

Work item tracking
Microsoft ADO (number only): 26087700
How I did it
How to verify it
2023-12-15 16:52:51 +08:00
Liu Shilong
2532661cd9
[ci] Enable sonic-restapi build in PR validation. (#17397)
Why I did it
Enable sonic-restapi build in two platform to avoid build break on restapi target.

Work item tracking
Microsoft ADO (number only): 26048426
How I did it
How to verify it
2023-12-15 16:25:26 +08:00
mssonicbld
73f6e5895a
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17499) 2023-12-15 15:31:28 +08:00
mssonicbld
be99434991
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17501)
#### Why I did it
src/sonic-swss
```
* ff524e6d - (HEAD -> master, origin/master, origin/HEAD) [dash] add a retry for an ACL rule creation if a tag is not created yet (#2972) (7 hours ago) [Yakiv Huryk]
* 620db3da - [ci] Allow partially success build artifact in PR checker pipeline. #2986 (3 days ago) [Liu Shilong]
* d357e6f1 - [copporch] Add safeguard during policer attribute update (#2977) (4 days ago) [Vivek]
* cb460394 - [fpmsyncd][WR] Relax the static schema constraint for ROUTE_TABLE (#2981) (5 days ago) [Vivek]
* a1ce21f6 - Change base directory referenced in coverage.xml (#2976) (6 days ago) [Lawrence Lee]
* 920959cf - [Dash] [UT] Add ZMQ test case for dash (#2967) (6 days ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-15 06:32:51 +08:00
Sudharsan Dhamal Gopalarathnam
f3f507826b
[FRR] Fix zebra memory leak when bgp fib suppress pending is enabled (#17484)
Fix zebra leaking memory with fib suppress enabled. Porting the fix from
FRRouting/frr#14983

While running test_stress_route.py, systems with lower memory started to throw low memory logs. On further investigation, a memory leak has been found in zebra which was fixed in the FRR community.
2023-12-14 09:13:20 -08:00
Ze Gan
dac2ba6e1b
[Azp]: Add dash-api dependencies on building Azp ubuntu20.04 (#17507)
Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-12-14 08:59:16 -08:00
mssonicbld
d6f6bbfc5d
[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#17436)
#### Why I did it
src/sonic-gnmi
```
* 88e82d4 - (HEAD -> master, origin/master, origin/HEAD) Replace PFC_WD_TABLE with PFC_WD (#173) (8 days ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-14 18:35:44 +08:00
Junchao-Mellanox
c1cb292310
[Mellanox] implement platform wait in python code (#17398)
- Why I did it
New implementation of Nvidia platform_wait due to:
1. sysfs deprecated by hw-mgmt
2. new dependencies to SDK
3. For CMIS host management mode

- How I did it
wait hw-management ready
wait SDK sysfs nodes ready

- How to verify it
manual test
unit test
sonic-mgmt regression
2023-12-14 12:04:24 +02:00
Junchao-Mellanox
f373a16e95
[Mellanox] Fix race condition while creating SFP (#17441)
- Why I did it
Fix issue xcvrd crashes due to cannot import name 'initialize_sfp_thermal':

Nov 27 09:47:16.388639 sonic ERR pmon#xcvrd: Exception occured at CmisManagerTask thread due to ImportError("cannot import name 'initialize_sfp_thermal' from partially initialized module 'sonic_platform.thermal' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/sonic_platform/thermal.py)")

- How I did it
Add lock for creating SFP object

- How to verify it
Unit test
Manual Test
2023-12-14 12:01:11 +02:00
mssonicbld
da3e7cbbba
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17476)
#### Why I did it
src/linkmgrd
```
* 79c3872 - (HEAD -> master, origin/master, origin/HEAD) [active-standby] Fix `show mux status` inconsistency introduced by orchagent rollback  (#225) (24 hours ago) [Jing Zhang]
* ba913c0 - [warmboot] use config_db connector to update mux mode config instead of CLI (#223) (2 days ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-12-14 16:34:10 +08:00
mssonicbld
e59ac879e6
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#17497) 2023-12-14 16:15:27 +08:00
mssonicbld
953d3dc175
[submodule] Update submodule sonic-dash-api to the latest HEAD automatically (#17503) 2023-12-14 15:47:12 +08:00
mssonicbld
c7e7dffb6e
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17500) 2023-12-14 15:42:54 +08:00
mssonicbld
67c0543127
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#17498) 2023-12-14 15:42:08 +08:00
mssonicbld
fa5829bca8
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17502) 2023-12-14 15:38:45 +08:00
Ze Gan
b21f33b8b1
[Azp]: Fix azp on building ubuntu20.04 and sonic-mgmt (#17439)
The Azp failed on ubuntu20.04 and sonic-mgmt building due to sonic-dash-api updating.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-12-13 22:49:04 -08:00
Nazarii Hnydyn
6d043a25bd
[installer] Create a blank grubenv if doesn't exist. (#17414)
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE

- How I did it
Initialized empty GRUB environment file after ONiE installation

- How to verify it
Install image from ONiE
Run BIOS firmware upgrade

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-14 08:41:12 +02:00
Junhua Zhai
53be9de743
Fix syncd_request_shutdown coredump in config reload on KVM sonic (#17486)
The issue is related to #16812. Process syncd does not run in the container gbsyncd on kvm sonic with default hwsku.

Microsoft ADO : 26151608

How I did it
If syncd has not run in container gbsyncd, it is not needed to trigger graceful shudown of syncd.

How to verify it
None of syncd_request_shutdown coredump in config reload on KVM sonic
2023-12-13 17:37:44 -08:00
zitingguo-ms
6a9ec987b5
change branch name (#17267)
Why I did it
Upgrade xgs SAI to 10.1 version.

Work item tracking
Microsoft ADO (number only): 25931321
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qualification on 7050cx3/7260cx3:

7050cx3:
https://dev.azure.com/mssonic/internal/_build/results?buildId=425450&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=425449&view=results
7260cx3: https://elastictest.org/scheduler/testplan/656f2b2b617fb27e41557494?leftSideViewMode=detail&prop=status&order=ascending
2023-12-14 09:37:35 +08:00
Junchao-Mellanox
1b84f3daa5
[Mellanox] update asic and module temperature in a thread for CMIS management (#16955)
- Why I did it
When module is totally under software control, driver cannot get module temperature/temperature threshold from firmware. In this case, sonic needs to get temperature/temperature threshold from EEPROM. In this PR, a thread thermal updater is created to update module temperature/temperature threshold while software control is enabled.

- How I did it
Query ASIC temperature from SDK sysfs and update hw-management-tc periodically
Query Module temperature from EEPROM and update hw-management-tc periodically

- How to verify it
Manual test
New Unit tests
2023-12-13 14:19:44 +02:00
Junchao-Mellanox
0d62cf0e92
[Mellanox] Remove EEPROM write limitation if it is software control (#17030)
- Why I did it
When module is under software control (CMIS host management enabled), EEPROM should be controlled by software and there should be no limitation for any write operation.

- How I did it
Remove EEPROM write limitation if a module is under software control

- How to verify it
Manual test
UT
2023-12-13 14:16:40 +02:00
Sudharsan Dhamal Gopalarathnam
dd39dd0e03
[Mellanox] Update SAI to 2311.26.0.28, SDK/FW to 4.6.2134/2012.2134 (#17481)
- Why I did it
Update SAI version to SAIBuild2311.26.0.28

Fixed issues
1. Traffic with unicast destination ip and multicast destination mac wasn't properly dropped
2. When working with SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD key/value enabled, trying to add a LAG member to a LAG which is created after warm boot initial configuration phase ended, will fail.
3. Optional feature of Port IP counters (SAI_PORT_STAT_IP*) , enabled by SAI XML per-port-ip-counter-enabled config node, wasn't initialized properly.
4. Creating BFD session for non default VRF fails (SAI_BFD_SESSION_ATTR_VIRTUAL_ROUTER != SAI_SWITCH_ATTR_DEFAULT_VIRTUAL_ROUTER_ID).
5. The default value for port FEC during switch init for Spectrum3 was initialized as 'auto' and not aligned to SAI header default 'none'. Note if setups has invalid configuration and relied previously on auto, now it might be necessary for the user to provide explicit valid value for SAI_PORT_ATTR_FEC_MODE

Update SDK/FW version to 4.6.2134/2012.2134
Fixed issues:
1. Updated SN3700C to enable limit to 100G speed.
2. Recovering from Low power mode might ends with port down.

- How I did it
Updating the versions in makefile

- How to verify it
Confirm issues fixed and run sonic-mgmt tests
2023-12-13 12:48:49 +02:00
Zain Budhwani
f82980784d
Change leaf value of used_cnt of sonic-events-swss:chk_crm_threshold (#17430)
### Why I did it

Current YANG model of sonic-events-swss:chk_crm_threshold has the type uint8 for leaf used_cnt which is too small of a range to hold values of used_cnt which can greatly exceed that. Updating leaf type of used_cnt and free_cnt to match defined definition.

Changed to uint32 as per defined here: https://github.com/sonic-net/sonic-swss/blob/master/orchagent/crmorch.h#L99

##### Work item tracking
- Microsoft ADO **(number only)**:26091912

#### How I did it

Update leaf value

#### How to verify it

UT and sonic-mgmt PR checker
2023-12-12 11:35:46 -08:00
Yevhen Fastiuk
5efb123ede
[NTP] Add NTP extended configuration (#15058)
hld [#1296](https://github.com/sonic-net/SONiC/pull/1296)
closes [#1254](https://github.com/sonic-net/SONiC/issues/1254)
depends-on [#60](https://github.com/sonic-net/sonic-host-services/pull/60), [#781](https://github.com/sonic-net/sonic-swss-common/pull/781), [#2835](https://github.com/sonic-net/sonic-utilities/pull/2835), [#10749](https://github.com/sonic-net/sonic-mgmt/pull/10749)

#### Why I did it
To cover the next AIs:
* Configure NTP global parameters
* Add/remove new NTP servers
* Change the configuration for NTP servers
* Show NTP status
* Show NTP configuration

### How I did it
* Add YANG model for a new configuration
* Extend configuration templates to support new knobs

### Description for the changelog
* Add ability to configure NTP global parameters such as authentication, dhcp, admin state
* Change the configuration for NTP servers
* Add an ability to show NTP configuration

#### Link to config_db schema for YANG module changes
[NTP configuration](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md#ntp-and-syslog-servers)
2023-12-11 13:31:35 -08:00
Junchao-Mellanox
b0bb3d40d3
[Mellanox] Implement low power mode for cmis host management (#17159)
- Why I did it
For cmis host management mode, the prevous sysfs cannot be used for low power mode setting. This PR reuses existing low power mode implementation in sonic_xcvr package when CMIS host management mode is enabled

- How I did it
Use sonic_xcvr low power mode implementation when CMIS host management mode is enabled.

- How to verify it
Manual test for CMIS host management mode
Regression test for old mode and backward compatible test
2023-12-11 10:42:01 +02:00
DavidZagury
ee598deced
[Mellanox][SKU] Adding Mellanox-SN4700-O8V48 SKU (#17425)
- Why I did it
To add new SKU Mellanox-SN4700-O8V48 with following requirements:

- How I did it
Create new SKU files based on the below definition:
* Port Mapping: 1-12 2x200G, 13-20 1x400G, 21-32 2x200G
   T0 topology: 48x200G Downlinks 8x400G uplinks.
   Length of downlink: 5m
   Length of uplink: 40m
* Auto-negotiation enable/disable: Yes
* FEC mode: RS
* Shared headroom: Enabled
* Shared headroom pool factor: 2
* Warmboot enabled: yes

- How to verify it
SONiC build with new SKU finish init, all ports up, qos tests suite from sonic-mgmt
2023-12-10 16:18:11 +02:00
Nazarii Hnydyn
278a958517
[Mellanox] Disable MFT bash autocompletion (#17442)
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.

- Why I did it
To overcome SN2700 20 sec delay on login

- How I did it
Removed MFT bash autocompletion part

- How to verify it
1. Build a mellanox image
2. Verify no such links after system boot.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-10 10:28:32 +02:00
Aravind-Subbaroyan
b222c7c240
Update cisco-8000.ini (#17428)
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
2023-12-07 17:05:05 -08:00
Stepan Blyshchak
b61528bee9
Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341)" (#15094) (#17367)
This reverts commit 499f57a7f7.

Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-07 15:20:39 -08:00
Xichen96
5992765d94
[dhcp_server] add show range cli (#17262)
* add show range

* add support for single ip
2023-12-07 14:50:38 -08:00
Ying Xie
2e072beb41
Revert "[pmon] update gRPC version to 1.57.0 (#16257)" (#17401)
This reverts commit 45a852233b.
2023-12-07 11:01:47 -08:00
Arun Saravanan Balachandran
80e743716c
[Dell] S6100 - Update EEPROM API serial_number_str to return service tag instead of serial number (#17440)
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239

How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.

How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
2023-12-07 10:08:42 -08:00
centecqianj
8ec4b53451
[Bookworm] Upgrade centec-arm64 platform to Bookworm. (#17411)
Why I did it
1. Upgrade centec-arm64 platform to Bookworm.
2. Solve the problem of compiling the docker-syncd-centec-rpc.gz error on the centec platform.

How I did it
1. Modified platform driver to comply with bookworm kernel.
2. Upgrade SONiC package versions of the centec platform.

How to verify it
1. Compile the centec-arm64 platform to generate sonic-centec-arm64.bin.
2. Compile the centec platform to generate docker-syncd-centec-rpc.gz.

Signed-off-by: centecqianj <qianj@centec.com>
2023-12-07 08:42:13 -08:00
Oleksandr Ivantsiv
fef1346483
[smartswitch] Add support of a new 't1-smartswitch' topology to the sample config generator. (#17326)
- Why I did it
Add support of a new 't1-smartswitch' topology to the sample config generator. The topology passed to sonic-cfggen utility as a parameter to generate sample configuration for Smart Switch:

sonic-cfggen  -k <SKU> --preset t1-smartswitch ...

- How I did it
Extend sample config generator to support new topology and read Smart Switch specific data from hwsku.json.

- How to verify it
Run unit tests. The changes are covered with the new unit tests.
2023-12-07 15:26:33 +02:00
Stepan Blyshchak
9555883e6f
[config-chassisdb] use cached variables (#17342)
- Why I did it
Improve boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-12-07 15:24:21 +02:00
Stepan Blyshchak
6435df1056
[config-topology] use cached variables (#17343)
- Why I did it
Improve  boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-12-07 15:22:44 +02:00
dbarashinvd
000a2ef818
[Mellanox] Enable CMIS host management (#16846)
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature

- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.

- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
2023-12-07 14:54:56 +02:00