Commit Graph

7305 Commits

Author SHA1 Message Date
siqbal1986
b67dc19532 [Yang model] Add Yang models for VNET table. (#14873)
Created Yang Modle for VNET table.
https://github.com/sonic-net/sonic-buildimage/issues/14534

##### Work item tracking
- Microsoft ADO **(number only)**:
18215579
2023-06-17 16:32:23 +08:00
Vaibhav Hemant Dixit
b62231566b Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)
This reverts commit 02b17839c3.

Reverts #14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
2023-06-17 14:32:23 +08:00
Saikrishna Arcot
8195e33120 Re-add 127.0.0.1/8 when bringing down the interfaces (#15080)
* Re-add 127.0.0.1/8 when bringing down the interfaces

With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.

To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.

Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-16 14:30:34 +08:00
Samuel Angebault
c1a7677b63 [Arista] fix platform.json for a few devices (#15308)
Why I did it
sonic-mgmt is failing tests due to invalid test data in platform.json
Fwutil is upset the chassis name in the platform_component.json of the 7060CX-32S

How I did it
Fixed the aforementioned issues
2023-06-16 09:55:02 +08:00
siqbal1986
baa5175819 Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE,VNET_ROUTE_TUNNEL_TABLE to the list (#14992)
* The 3 tables in state DB need to be cleaned up after SWSS restart for have consistant state.
2023-06-16 09:54:58 +08:00
pavannaregundi
b8cd8d8e06 [Marvell] Update armhf driver version (#15138)
Changes in MRVL_PRESTERA_DRIVER_1.4:
- Memory leak fixed by releasing pci device after retrieval.
- Fixes for 5.10 kernel porting.

Change-Id: I1d7ee4ec02ec17a29ddb8473725ab68ca399748b

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-06-16 09:54:53 +08:00
Ikki Zhu
ea2e849607 [celestica/e1031]: enable emc2305 fan controller timeout feature (#14401)
Why I did it
There is rare condition, emc2305 hold SMBus and cause SMBus completion wait timed out.

How I did it
Enable EMC2305 SMBus timeout feature, 30ms period of inactivity will reset the interface.

How to verify it
Use 'i2cget -y -f 23 0x4d 0x20 b' to read EMC2305 configuration register and check DIS_TO bit not set.

Signed-off-by: Eric Zhu <erzhu@celestica.com>
2023-06-16 09:54:47 +08:00
Marty Y. Lok
a44ee587dd [Nokia-IXR7250E][Devicedata] update the device data for Nokia IXR7250E platform (#15216)
Why I did it
Update the device data files to support 1024 LAGs for Nokia IXR7250E platform
fixes https://github.com/Nokia-ION/ndk/issues/15

How I did it
Update the lag_id_end=1024 in chassisdb.conf file and add the trunk_group_max_members=16 in the BCM config file

How to verify it
check to allow to create lag ids up to 1024 with 16 port members

Signed-off-by: mlok <marty.lok@nokia.com>
2023-06-16 09:54:40 +08:00
mssonicbld
98bcc9e922
[yang] Change asn to start from 0 for bgp monitor (#15350) (#15483) 2023-06-16 03:57:06 +08:00
Liping Xu
deb94af61b allow docker_inram to kernel cmd list (#15374)
Why I did it
After docker_inram is enabled, the docker folder's default max size is 1.5G.
It's not big enough for some tests which need to install additional docker images or install extra packages.

Work item tracking
Microsoft ADO 24199761:
How I did it
add docker_inram into cmdline_allowlist

How to verify it
sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append'
sudo reboot and check the docker folder size
2023-06-15 14:33:58 +08:00
Lior Avramov
d26850611f
[Mellanox] [202211] Remove iproute2 SDK patches from SONiC tree and consume them from SDK github (#15061)
Why I did it
SDK patches for iproute2 were added to SONiC tree as a temporary solution.
Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation.

How I did it
During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation.

How to verify it
Compile and load on switch, verify interfaces network devices created successfully.
Verify LLDP shows connections to neighbors.
Verify ping between 2 hosts over 2 router ports is successful.
2023-06-14 17:13:10 +08:00
mssonicbld
4098a90b90
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15441)
#### Why I did it
src/sonic-swss
```
* bccb1cc - (HEAD -> 202211, origin/202211) [202211] [sflowmgrd] Infer sampling rate dynamically based on oper speed (#2805) (4 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-13 14:56:44 +08:00
mssonicbld
b048280fe0
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#15417)
#### Why I did it
src/sonic-host-services
```
* cdc621b - (HEAD -> 202211, origin/202211) [202211][config reload] Config Reload Enhancement (#64) (2 days ago) [Sudharsan Dhamal Gopalarathnam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-13 12:29:08 +08:00
mssonicbld
1517a4f7ec
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15422)
#### Why I did it
src/sonic-utilities
```
* 1246bc81 - (HEAD -> 202211, origin/202211) [config reload]Config Reload Enhancement (#2693) (#2863) (2 days ago) [Sudharsan Dhamal Gopalarathnam]
* d69aae4d - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan (#2852) (2 days ago) [Yaqiang Zhu]
* 0f6bf8ac - [config]: Dynamically start and stop ndppd (#2814) (2 days ago) [Lawrence Lee]
* 48a63ff1 - Fix issue: out of range sflow polling interval is accepted and stored in config_db (#2847) (2 days ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-06-13 12:29:04 +08:00
Sudharsan Dhamal Gopalarathnam
78977ddbce
[202211][config reload]Config Reload Enhancement (#15334)
Backporting #13969

Why I did it
Implementing code changes for sonic-net/SONiC#1203

Work item tracking
Microsoft ADO (number only):
How I did it
Removed the timers and delayed target since the delayed services would start based on event driven approach.
Cleared port table during config reload and cold reboot scenario.
Modified yang model, init_cfg.json to change has_timer to delayed

How to verify it
Added UT to verify
2023-06-12 13:22:16 +08:00
mssonicbld
3e2211b420 [submodule] Update submodule sonic-sairedis to the latest HEAD automatically 2023-06-10 16:32:46 +08:00
mssonicbld
9f721639b0 [submodule] Update submodule sonic-swss to the latest HEAD automatically 2023-06-10 16:32:41 +08:00
abdosi
4111c25557 updated internal route policy for chassis-packet (#15349)
What I did:

Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state

Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops

For VOQ chassis there is no e-BGP peer (connected route via bgp )  resolution as route is added as Static route by orchagent over Ethernet-IB.

Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507

How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-06-10 14:32:44 +08:00
Arvindsrinivasan Lakshmi Narasimhan
6a3a6c77f4 set the default value for the port fec to RS on J2 based LC (#15346)
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard

How to verify it
Tests on chassis
2023-06-10 14:32:36 +08:00
DavidZagury
8de162d4af [Mellanox] Update SN5600 SAI XML file (#14947)
- Why I did it
Update SAI xml file to align with the default SKU

- How I did it
Update the SN5600 SAI xml file

- How to verify it
Install image on SN5600 device
2023-06-10 14:32:30 +08:00
Kebo Liu
3100425299 [Mellanox] Update SN5600 sensors.conf and pcie.yaml files (#14883)
- Why I did it
Update the sensors.conf and pcie.yaml according to the real hardware.

- How I did it
Update the sensors.conf and pcie.yaml

- How to verify it
run relevant sonic-mgmt test cases.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-06-10 14:32:26 +08:00
Junchao-Mellanox
b8ac86e14a [system-health] Add fan direction check for system health (#14509)
- Why I did it
Add fan direction check to system health, all fans should be in the same direction

- How I did it
Add fan direction check to system health, all fans should be in the same direction

- How to verify it
Manual test
Unit test
Added sonic-mgmt test case to verify
2023-06-10 14:32:21 +08:00
StormLiangMS
8aeb2ba715
Cherrypick to 202211 [Mellanox] Add patch commit-id mapping to description #15416
cherry pick #15052
2023-06-10 13:58:12 +08:00
Junchao-Mellanox
af7412d3a1 [Mellanox] add PSU fan direction support (#14508)
- Why I did it
Add PSU fan direction support

- How I did it
Implement fan.get_direction for PSU fan

- How to verify it
Manual test
Unit test
2023-06-10 12:32:26 +08:00
mssonicbld
c99e035232
Added change to add 'peerType' as element in NEIGH_STATE_TABLE. (#15265) (#15380) 2023-06-08 05:09:53 +08:00
mssonicbld
5f4b54a9cd
[ci/build]: Upgrade SONiC package versions (#15361) 2023-06-06 19:46:12 +08:00
mssonicbld
e4d8355976
[ci/build]: Upgrade SONiC package versions (#15329) 2023-06-04 18:12:12 +08:00
mssonicbld
4e9569ee3b
[ci/build]: Upgrade SONiC package versions (#15165) 2023-06-03 17:22:05 +08:00
mssonicbld
084564bdde
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933) (#15317) 2023-06-03 09:16:42 +08:00
Ye Jianquan
dd989a64d7
[CI/CD] Refine pr test definition, remove old test jobs and testbedv2 flags (#15305) 2023-06-02 16:33:41 +08:00
Ye Jianquan
167704807e
[CI/CD] Migrate to SONiC Elastictest (#15273) 2023-06-02 10:38:55 +08:00
Sudharsan Dhamal Gopalarathnam
d93970bc2e
[Mellanox] Update hw-mgmt to 7.0020.4301 (#15260) (#15283)
Manual Cherrypick of #15260

Why I did it
Bug fix:

I2C bus is stuck - Unable to probe I2C bus 2-0048, which causes /var/run/hw-management/config/sfp_counter, module_counter to be zero and pmon docker unable to start.
Work item tracking
Microsoft ADO (number only):
How I did it
Update HW-MGMT package version in the make file
Update HW-MGMT submodule pointer

How to verify it
run full sonic-mgmt regression
2023-06-01 11:41:59 +08:00
Ye Jianquan
69d61047c4
[CI/CD] Refine PR test templates and test_plan.py to be ready to migrate to Elastictest (#15259) 2023-05-31 09:37:38 +08:00
Neetha John
b82145bc27 [qos] Update RDMA-CENTRIC lossy profile to use static threshold for Th devices (#14372)
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles

How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold

How to verify it
Verified the changes manually on a Th device.
Existing unit tests render Th template from the RDMA-CENTRIC folder. Updated the expected output to use the correct threshold
2023-05-31 00:32:12 +08:00
lixiaoyuner
8867d2459f Clean up the old version container images (#14978)
Why I did it
Our k8s feature will pull new version container images for each upgrade, the container images inside sonic will be more and more, but for now we don’t have a way to clean up the old version container images, the disk may be filled up. Need to add cleaning up the old version container images logic.

Work item tracking
Microsoft ADO (number only):
17979809
How I did it
Remove the old version container images besides the feature's current version and last version image, last version image is saved for supporting fallback.

How to verify it
Check whether the old version images are removed
2023-05-30 20:50:15 +08:00
mssonicbld
7b6a7d8283 [submodule] Update submodule sonic-swss to the latest HEAD automatically 2023-05-30 16:32:45 +08:00
mssonicbld
24daa8ab40
[healthd] Use unix_socket_path instead of loopback ip (#14843) (#15249) 2023-05-29 22:40:31 +08:00
Jing Kan
2cf1370ba0 [YANG] Add MgmtLeafRouter to Device Neighbor Metadata element type list (#15202)
Why I did it
Introduce a new valid neighbor element type to YANG.

Work item tracking
Microsoft ADO (number only): 23994521
How I did it
Add MgmtLeafRouter to element network type list.

How to verify it
Passes UTs
2023-05-29 14:34:10 +08:00
mssonicbld
d598217bab [submodule] Update submodule sonic-swss to the latest HEAD automatically 2023-05-26 16:32:43 +08:00
mssonicbld
d8f2f7c034
[Mellanox] Use sysfs for sfp reset/LPM/presence (#14130) (#15215) 2023-05-26 02:25:21 +08:00
mssonicbld
2098634ab3
[Mellanox] Update SAI to 2211.24.0.21 and SDK/FW to 4.5.5142/2010_5144 (#15072) (#15214) 2023-05-26 02:20:30 +08:00
mssonicbld
46e72ede39 [submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically 2023-05-25 16:32:39 +08:00
Ye Jianquan
9764ec297e
Refine test job definition and assert logic (#14961)
Why I did it
Remove 'kvmtest-t0' and 'kvmtest-t1-lag' test jobs since all the test jobs are required (continueOnError: false) already, and will only enable one of classical and testbedV2 tests, no need to do an unnecessary 'or' compute test job.
Change agent pool to reduce cost and avoid congestion
2023-05-24 10:26:49 +08:00
Yaqiang Zhu
782c044a75 [minigraph] Add rack_mgmt_rack parse support in minigraph.py (#15064)
Why I did it
We need to store information of power shelf in config_db for SONiC MX switch. Current minigraph parser cannot parse rack_mgmt_map field.

Work item tracking
Microsoft ADO (number only): 22179645
How I did it
Add support for parsing rack_mgmt_map.
2023-05-23 14:33:24 +08:00
Yaqiang Zhu
8a48cab032
[202211][yang] Extend device_metadata yang model with rack_mgmt_map (#15141)
Why I did it
Manually cherry-pick and resolve conflicts of this PR: #15109
Extend device_metadata yang model.

Work item tracking
Microsoft ADO (number only): 22912178
How I did it
Add rack_mgmt_map field in yang model.

How to verify it
Build image.
2023-05-23 09:44:38 +08:00
mssonicbld
93d62f87a7
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15172) 2023-05-21 14:52:18 +08:00
mssonicbld
09e2bc9964
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15164) 2023-05-20 15:08:40 +08:00
Dror Prital
2e8b7d2ede Support pulling sonic-slave-docker image from path at REGISTRY_SERVER (#14907)
- Why I did it
In order to reduce sonic build time, there is an option to acquire sonic slave docker(s) from artifact server (reduce sonic make configure time).
Current implementation supports only convention of:

<REGISTRY_SERVER>:<REGISTRY_PORT>/<SLAVE_BASE_IMAGE>:<SLAVE_BASE_TAG>

In case the SLAVE_BASE_IMAGE appear in internal path inside the server, the convention should be like that:

<REGISTRY_SERVER>:<REGISTRY_PORT><REGISTRY_SERVER_PATH>/<SLAVE_BASE_IMAGE>:<SLAVE_BASE_TAG>

When REGISTRY_SERVER_PATH (that is set on rules/config) will have to start with "/".

If REGISTRY_SERVER_PATH will not be set, the behavior will remain the same it works today.

- How I did it
Add ability to set REGISTRY_SERVER_PATH and update the code for docker image tag and docker image pull accordingly

- How to verify it
Use sonic slave docker image from artifact server in which the image is kept in internal folder and make sure it consume it.
2023-05-18 14:33:33 +08:00
Vivek
e2876b0062 [Sys Mon] Fix the service entry delete in state_db because of timer job (#14702)
Why I did it
systemd stop event on service with timers can sometime delete the state_db entry for the corresponding service.

Note: This won't be observed on the latest master label since the dependency on timer was removed with the recent config reload enhancement. However, it is better to have the fix since there might be some systemd services added to system health daemon in the future which may contain timers

root@qa-eth-vt01-4-3700c:/home/admin# systemctl stop snmp
root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status 
System is not ready - one or more services are not up

Service-Name            Service-Status    App-Ready-Status    Down-Reason
----------------------  ----------------  ------------------  -------------
<Truncated>
ssh                     OK                OK                  -
swss                    OK                OK                  -
syncd                   OK                OK                  -
sysstat                 OK                OK                  -
teamd                   OK                OK                  -
telemetry               OK                OK                  -
what-just-happened      OK                OK                  -
ztp                     OK                OK                  -
<Truncated>
Expected

Should see a Down entry for SNMP instead of the entry being deleted from the STATE_DB

root@qa-eth-vt01-4-3700c:/home/admin# show system-health sysready-status 
System is not ready - one or more services are not up

Service-Name            Service-Status    App-Ready-Status    Down-Reason
----------------------  ----------------  ------------------  -------------
<Truncated>
snmp                    Down              Down                Inactive
ssh                     OK                OK                  -
swss                    OK                OK                  -
syncd                   OK                OK                  -
sysstat                 OK                OK                  -
teamd                   OK                OK                  -
telemetry               OK                OK                  -
what-just-happened      OK                OK                  -
ztp                     OK                OK                  -
<Truncated>
How I did it
Happens because the timer is usually a PartOf service and thus a stop on service is propagated to timer. Fixed the logic to handle this

Apr 18 02:06:47.711252 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.service from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.711347 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.service ] 
Apr 18 02:06:47.722363 r-lionfish-16 INFO healthd: snmp.service service state changed to [inactive/dead]

Apr 18 02:06:47.723230 r-lionfish-16 DEBUG healthd: Main process- received event:snmp.timer from source:sysbus time:2023-04-17 23:06:47
Apr 18 02:06:47.723328 r-lionfish-16 INFO healthd: check_unit_status for [ snmp.timer ] 

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-05-18 09:47:01 +08:00
Anish Narsian
71ecd727ac [arp_update] Resolve neighbors from config_db (#15006)
* To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.
2023-05-18 09:46:56 +08:00