Commit Graph

7282 Commits

Author SHA1 Message Date
anamehra
e107549942 chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-12 18:32:47 +08:00
mssonicbld
7942c92196 [submodule] Update submodule to the latest HEAD automatically 2023-04-12 16:33:48 +08:00
xumia
4ce33aad75
[Build] Optimize the version control for Debian packages (#14557) (#14610)
Why I did it
Optimize the version control for Debian packages.
Fix sonic-slave-buster/sources.list.amd64 not found display issue, need to generate the file before running the shell command to evaluate the sonic image tag.
When using the snapshot mirror, it is not necessary to update the version file based on the base image. It will reduce the version dependency issue, when an image is not run when freezing the version.

How I did it
Not to update the version file when snapshot mirror enabled.

How to verify it
2023-04-12 15:00:48 +08:00
mssonicbld
73766c2fa1
Finalize fast-reboot in warmboot finalizer (#14238) (#14608) 2023-04-11 22:54:56 +08:00
mssonicbld
cde1574801
[submodule] Update submodule to the latest HEAD automatically (#14577) 2023-04-10 14:24:52 +08:00
mssonicbld
4d0f1c1972
[ci/build]: Upgrade SONiC package versions (#14578) 2023-04-09 19:17:25 +08:00
mssonicbld
95f387cddf
Fix issue: wrong teamd link watch state after warm reboot (#14084) (#14575) 2023-04-09 00:59:15 +08:00
mssonicbld
fff0e7de89
[yang]Updating vxlan yang model to include IPv6 source in VxLAN tunnel (#14363) (#14576) 2023-04-09 00:33:25 +08:00
mssonicbld
05a9ce9628
[ci/build]: Upgrade SONiC package versions (#14572) 2023-04-08 19:08:35 +08:00
mssonicbld
18cd788c62 [submodule] Update submodule to the latest HEAD automatically 2023-04-07 16:33:06 +08:00
mssonicbld
a3951c2041
Increase wait_for_tunnel() timeout to 90s (#14279) (#14563) 2023-04-07 16:02:01 +08:00
mssonicbld
8fc020d693
[Build] Support to use the snapshot mirror for debian base image (#14474) (#14562) 2023-04-07 15:38:03 +08:00
Saikrishna Arcot
db8bcadd56
[submodule] Advance sonic-swss-common pointer (#14504)
Update sonic-swss-common submodule pointer to include the following:

* 6e4daf1 Revamp module build script to make it work for 5.15 on Ubuntu 20.04 (sonic-net/sonic-swss-common#720)
* 7f40cde Non recursive automake and Debian packaging changes (sonic-net/sonic-swss-common#700)

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-04-06 23:36:12 -07:00
mssonicbld
c031e5a5d1
[submodule] Update submodule to the latest HEAD automatically (#14348) 2023-04-06 15:30:52 +08:00
mssonicbld
483b9867e9
[ci/build]: Upgrade SONiC package versions (#14529) 2023-04-05 19:02:12 +08:00
Vivek
f27632153a
[202211] Advance sonic-dhcp-relay submodule (#14473)
67a3bdf show counters wrong cli output fixed (#36)
5b3eea1 Update package cache, and bail on the first error (#35)
1d221b0 dhcpv6 relay UT code coverage improve (#32)
514b084 dhcpv6 packet handling code refine (#30)

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-04-02 14:04:26 -07:00
mssonicbld
8863910bc8
[ci/build]: Upgrade SONiC package versions (#14492) 2023-04-02 19:28:22 +08:00
mssonicbld
f3b6860076
[ci/build]: Upgrade SONiC package versions (#14488) 2023-04-01 19:35:15 +08:00
mssonicbld
5b028dc60f
[ci/build]: Upgrade SONiC package versions (#14478) 2023-04-01 03:16:16 +08:00
mssonicbld
20c4aab017
Pin mmh3 package version in sonic-slave-stretch docker (#14463) (#14475) 2023-03-31 15:51:18 +08:00
Jemston Fernando
8bbc8eb8cf
[celestica]: Fix Belgite platform issues (#14036)
As part of platform hardening this commit fixes several platform issues
in various components like PSU, FAN, Temperature, LED.
Cherrypick PR#13389
2023-03-27 10:16:16 -07:00
Liu Shilong
e5dcae8a11
[ci] Fix build issue for vs vhdx image. #14424
Why I did it
sonic-slave-stretch build failed for mmh3 version update to 3.10 on Mar 24.

How I did it
Enable reproducible build for vhdx image.

How to verify it
2023-03-27 22:34:23 +08:00
mssonicbld
cc631fdf35
change static rt expiry timer max value (#14397) (#14419) 2023-03-26 23:31:29 +08:00
mssonicbld
4c4e9eec55
Fix the demo_part_size not initialized issue when creating partition (#14296) (#14394) 2023-03-24 01:20:18 +08:00
Hua Liu
dad37bf471
[202211] Update sonic-py-common, add missing dependency to redis-dump-load (#14360)
Update sonic-py-common, add missing dependency to redis-dump-load.
This is manually cherry-pick PR for https://github.com/sonic-net/sonic-buildimage/pull/14347
After 202211, the redis-dump-load been patched by sonic, so can't cherry-pick master branch PR to 202211 branch.

#### Why I did it
The script sonic_db_dump_load.py in sonic-py-common is depends on redis-dump-load, however the dependency is missing.

#### How I did it
Add redis-dump-load dependency.

#### How to verify it
Pass all E2E test case.

#### Description for the changelog
Update sonic-py-common, add missing dependency to redis-dump-load.
2023-03-23 09:39:06 -07:00
mssonicbld
fe1e2b16f7
[ci/build]: Upgrade SONiC package versions (#14382) 2023-03-22 19:59:24 +08:00
xumia
0a7037641c
[Security] Fix some of vulnerability issue relative python packages (#14269) (#14352)
Why I did it
Fix some of vulnerability issue relative python packages #14269
Pillow: [CVE-2021-27921]
Wheel: [CVE-2022-40898]
lxml: [CVE-2022-2309]

How I did it
How to verify it
2023-03-22 15:42:29 +08:00
Dev Ojha
24c53a5d34 [Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14280)
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.

How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.

Signed-off-by: dojha <devojha@microsoft.com>
2023-03-20 22:36:33 +08:00
Samuel Angebault
f394121903 [Arista] Add missing platform_components.json (#14067)
Provide platform-components.json for Clearwater2 and Wolverine

These files are needed for fwutil platform sonic-mgmt tests to pass.

Fix PikeZ platform_components.json

Co-authored-by: Patrick MacArthur <pmacarthur@arista.com>
Co-authored-by: Andy Wong <andywong@arista.com>
2023-03-20 20:54:49 +08:00
Vivek
4dc61fcbc1 [lldpmgrd] Don't log error message for outdated event (#14178)
- Why I did it
Fixes #14236

When a redis event quickly gets outdated during port breakout, error logs like this are seen

Mar  8 01:43:26.011724 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': {'admin_status': 'down'}, 'Ethernet68': {'admin_status': 'down'}}}
Mar  8 01:43:26.012565 r-leopard-56 INFO ConfigMgmt: Writing in Config DB
Mar  8 01:43:26.013468 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': None, 'Ethernet68': None}, 'INTERFACE': None}
Mar  8 01:43:26.018095 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Configure Ethernet64 admin status to down
Mar  8 01:43:26.018309 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Delete Port: Ethernet64
Mar  8 01:43:26.018641 r-leopard-56 NOTICE lldp#lldpmgrd[32]: :- pops: Miss table key PORT_TABLE:Ethernet64, possibly outdated
Mar  8 01:43:26.018654 r-leopard-56 ERR lldp#lldpmgrd[32]: unknown operation ''

- How I did it
Only log the error when the op is not empty and not one of ("SET" & "DEL" )

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-03-20 20:54:45 +08:00
Saikrishna Arcot
60067e76c2 Don't create the members@ array in config_db for PC when reading from minigraph (#13660)
Fixes #11873.

#### Why I did it

When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.

In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
2023-03-20 20:54:37 +08:00
mssonicbld
7b61e894ac
sonic-buildimage Remove unused SAT port from arista configs. (#14167) (#14333) 2023-03-19 23:08:48 +08:00
mssonicbld
bcf35fdee1
[yang]: Add Yang model support for adding Channel to PORT table (#14228) (#14338) 2023-03-19 23:03:23 +08:00
mssonicbld
b3109fefe5
[dhcp-relay] Add dhcp_relay show cli (#13614) (#14342) 2023-03-19 22:48:25 +08:00
Song Yuan
09a3f922fb Add QOS profiles for Arista SKUs (#13829) 2023-03-19 22:33:05 +08:00
kellyyeh
d45da2319f Update dhcpmon rx/tx packet filtering and fix server rx count (#13898)
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.

Add debug mode which prints counter to syslog

How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.

Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1

Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1

How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.
2023-03-19 22:33:00 +08:00
Arvindsrinivasan Lakshmi Narasimhan
1d57d1b6dc [chassis][voq] 400g to100g speed changes for chassis linecards (#13935)
On SONiC VoQ chassis, the speed changes are done from 400G to 100G needs to be supported on 400G linecards.
To enable this, along with speed change the port lanes need to be changed. This PR has the changes to update the port lanes when such speed change happens.

This PR is intended only for VoQ chassis linecards. These platforms today have 400g port with 8 serdes lines, and 100g will operate with 4 serdes lane. When the port speed changes from 400G to 100G the first 4 lanes will be used for 100G port.

Platforms which support 2x50g PAM4 or support 100G PAM4 serdes or other combinations are not handled in the PR.

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-03-19 22:32:56 +08:00
Zain Budhwani
d8c9517280 Remove dialout as critical process (#14006)
#### Why I did it

Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely

#### How I did it

Remove from critical process list
2023-03-19 22:32:52 +08:00
Tejaswini Chadaga
37be88bef2 Fix VOQ_CHASSIS_V6_PEER route-map config (#14055)
* Fix typo in VOQ_CHASSIS_V6_PEER route-map config

* Updated UT files with the changed config
2023-03-19 22:32:47 +08:00
Junchao-Mellanox
bb41b55f1a [system-health] Make check interval more accurate (#14085)
- Why I did it

Healthd check system status every 60 seconds. However, running checker may take several seconds. Say checker takes X seconds, healthd takes (60 + X) seconds to finish one iteration. This implementation makes sonic-mgmt test case not so stable because the value X is hard to predict and different among different platforms. This PR introduces an interval
compensation mechanism to healthd main loop.

- How I did it

Introduces an interval compensation mechanism to healthd main loop: healthd should wait (60 - X) seconds for next iteration

- How to verify it

Manual test
Unit test
2023-03-19 22:32:43 +08:00
kellyyeh
6fc71c2f40 Update dhcpv6-relay yang model (#14144)
Why I did it
Add interface-id in dhcpv6-relay yang model

How I did it
Add interface-id option and corresponding UT. Updated configuration.md

How to verify it
kellyyeh@kellyyeh:~/sonic-buildimage/src/sonic-yang-models$ pyang -Vf tree -p /usr/local/share/yang/modules/ietf ./yang-models/sonic-dhcpv6-relay.yang
2023-03-19 22:32:39 +08:00
mssonicbld
499f57a7f7
[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341) 2023-03-19 22:32:37 +08:00
lixiaoyuner
e33af15d2d Install kubernetes-cni for kubelet (#14163)
Why I did it
Find a new bug on kubelet side. The kubernetes-cni plug-in was removed in #12997, the reason is that the plug-in will be auto installed when install kubeadm, and will report error if we don't remove the install code. But after removal, the version auto installed is different from what we installed before. This will affect the kubelet action in some scenarios we don't find before. Need to install it by another way.

How I did it
Install kubernetes-cni==0.8.7-00 before install kubeadm

How to verify it
Flannel binary will be installed under /opt/cni/bin/ folder
2023-03-19 22:32:35 +08:00
jhli-cisco
098678fd3f [sonci-slave]: update sonic-slave docker files to include cisco sdk dependencies (#14203)
cisco SDK dependencies needed
2023-03-19 22:32:29 +08:00
Neetha John
17bf0c85cb Update dynamic threshold for TD2 (#14224)
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic

How I did it
Modified pg_profile_lookup.ini to reflect the correct value

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-19 22:32:26 +08:00
Neetha John
0aacc4531a [storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-19 22:32:22 +08:00
mssonicbld
5c55eb8c40 [ci/build]: Upgrade SONiC package versions 2023-03-19 20:51:06 +08:00
Sudharsan Dhamal Gopalarathnam
156189dbad [Mellanox]Fix lpmode set when logical port is larger than 64 (#14138)
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below

Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-19 20:50:58 +08:00
Junhua Zhai
29f3c4944a [gearbox] use credo sai v0.9.0 (#14149)
Update credo sai package to the latest v0.9.0.
2023-03-19 20:50:53 +08:00
Dror Prital
ba14f728de Update SDK/FW to version 4.5.4206/4.5.4204 (#14164)
- Why I did it
To include latest fixes:

Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog

- How I did it
Updated SDK submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".
2023-03-19 20:50:49 +08:00