Commit Graph

1096 Commits

Author SHA1 Message Date
Zain Budhwani
7c88542d6e
Disable eventd and rsyslog plugin in slim images (#17905) (#17972)
Disable eventd at buildtime for slim images

- Microsoft ADO **(number only)**:26386286

Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image

Manual testing
2024-02-01 10:14:20 +08:00
Longxiang Lyu
06c27e5058 [dualtor] Disable zebra link-detect for vlan interfaces (#17784)
* [dualtor] Disable zebra link-detect for vlan interfaces

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2024-01-30 16:32:42 +08:00
abdosi
a118a5ba43
[chassis] Added support of isolating given LC in Chassis with TSA mode (#16732) (#17860)
What I did:
Added support when TSA is done on Line Card make sure it's completely
isolated from all e-BGP peer devices from this LC or remote LC

Why I did:
Currently when TSA is executed on LC routes are withdrawn from it's connected e-BGP peers only. e-BGP peers on remote LC can/will (via i-BGP) still have route pointing/attracting traffic towards this isolated LC.

How I did:

When TSA is applied on LC all the routes that are advertised via i-BGP are set with community tag of no-export so that when remote LC received these routes it does not send over to it's connected e-BGP peers.

Also once we receive the route with no-export  over iBGP match on it and and set the local preference of that route to lower value (80) so that we remove that route from the forwarding database. Below scenario explains why we do this:

- LC1 advertise R1 to LC3
- LC2 advertise R1 to LC3
- On LC3 we have multi-path/ECMP over both LC1 and LC2
- On LC3 R1 received from LC1 is consider best route over R1 over received from LC2 and is send to LC3 e-BGP peers
- Now we do TSA on LC2
- LC3 will receive R1 from LC2 with community no-export and from LC1 same as earlier (no change)
- LC3 will still get traffic for R1 since it is still advertised to e-BGP peers (since R1 from LC1 is best route)
- LC3 will forward to both LC1 and LC2 (ecmp) and this causes issue as LC2 is in TSA mode and should not receive traffic

To fix above scenario we change the preference to lower value of R1 received from LC2 so that it is removed from Multi-path/ECMP group.

How I verfiy:

UT has been added to make sure Template generation is correct
Manual Verification of the functionality
sonic-mgmt test case will be updated accordingly.
Please note this PR is on top of this :#16714 which needs to be merged first.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2024-01-29 22:54:18 +08:00
abdosi
6d767e549d
[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714) (#17837)
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's

How I did:

- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.

How I verify:

Manual Verification

UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2024-01-19 12:53:26 +08:00
Nazarii Hnydyn
a3b5ea93eb
[sonic-cfggen]: Optimize template rendering and database access. (#17650)
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

Why I did it
Improved switch init time
Work item tracking
N/A
How I did it
Replaced: sonic-cfggen -> sonic-db-cli
Aggregated template list for sonic-cfggen
How to verify it
Run warm-reboot
2024-01-10 08:56:20 +08:00
Stepan Blyshchak
3cb6343311
[202305] Revert bgp suppress fib pending (#17578)
DEPENDS ON: sonic-net/sonic-swss#2997 sonic-net/sonic-utilities#3093

What I did

Revert the feature.

Why I did it

Revert bgp suppress FIB functionality due to found FRR memory consumption issues and bugs.

How I verified it

Basic sanity check on t1-lag, regression in progress.
2024-01-02 08:59:17 +08:00
Junchao-Mellanox
a8ad19d4ac
[202305] Optimize syslog rate limit feature for fast and warm boot (#17478)
Backport PR #17458 due to conflict.

Why I did it
Optimize syslog rate limit feature for fast and warm boot

Work item tracking
Microsoft ADO (number only):
How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
How to verify it
Manual test
Regression test
2023-12-28 20:53:49 +08:00
mssonicbld
337f925058
[frr]: Force disable next hop group support. (#17344) (#17423) 2023-12-06 15:53:52 +08:00
StormLiangMS
fa7be88599
Revert "[pmon] update gRPC version to 1.57.0 (#16257) (#17219)" (#17391)
This reverts commit 066065f1cd.
2023-12-05 10:38:51 +08:00
StormLiangMS
2c28502ddd
Revert "Share docker image and use telemetry container for 202305 (#17255)" (#17356)
This reverts commit 2c7d53e5fb.
2023-11-30 20:41:38 +08:00
ganglv
2c7d53e5fb
Share docker image and use telemetry container for 202305 (#17255)
Why I did it
Need to share docker image for telemetry and gnmi, and only use telemetry container for 202305 branch

Work item tracking
Microsoft ADO (number only):
How I did it
Add a new docker image, base-gnmi, build sonic-gnmi and sonic-telemetry on this docker image.
Enable telemetry container.

How to verify it
Run end to end test for telemetry and gnmi.
2023-11-24 11:22:48 +08:00
vdahiya12
066065f1cd
[pmon] update gRPC version to 1.57.0 (#16257) (#17219)
* [pmon] update gRPC version to 1.57.0 (#16257)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

* fix conflict

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-11-23 21:03:07 +08:00
ganglv
733a902a70
Revert "[202305] Share image for gnmi and telemetry (#17137)" (#17261)
This reverts commit f2a495f7e5.
2023-11-22 23:51:34 +08:00
abdosi
785ab1f51f
[202305] PR to make BGP GTSM feature for packet-chassis (#17237)
* [chassis/multi-asic] Make sure iBGP session established as directly connected  (#16777)

What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.

Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example

- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default  where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.

Above scenario can result in packet mis-forwarding on data plane

How I fixed it:-

To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature

neighbor PEER ttl-security hops NUMBER

This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.

We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.

How I verify:

Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Update peer-group.conf.j2

* Update result_all.conf

* Update result_base.conf

---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-22 15:37:19 +08:00
mssonicbld
f4df761ed8
[VOQ][saidump] Install rdbtools into the docker base related containers. (#16466) (#17222) 2023-11-19 14:38:05 +08:00
abdosi
6c03da95c2 [chassis/multi-asic] Enable Sending BGP Community over internal neighbors over iBGP Session (#16705)
What I did:
Enable Sending BGP Community over internal neighbors over iBGP Session

Microsoft ADO: 25268695

Why I did:
Without this change BGP community send by e-BGP Peers are not carry-forward to other e-BGP peers.


str2-xxxx-lc1-2# show bgp ipv6  20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52141
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65500
    2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      Last update: Tue Sep 26 16:08:26 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52688
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65502
    3.3.3.6 from 3.3.3.6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      Last update: Tue Sep 26 15:45:51 2023

After the change

str2-xxxx-lc2-2(config)# router bgp 65100
str2-xxxx-lc2-2(config-router)# address-family ipv4
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V4 send-community
str2-xxxx-lc2-2(config-router-af)# exit
str2-xxxx-lc2-2(config-router)# address-family ipv6
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V6 send-community
str2-xxxx-lc1-2# show bgp ipv6  20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52400
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65500
    2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      **Community: 1111:1111**
      Last update: Tue Sep 26 16:10:19 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52947
Paths: (1 available, best #1, table default)
  Not advertised to any peer
  65000 65502
    3.3.3.6 from 3.3.3.6 (3.3.3.6)
      Origin IGP, localpref 100, valid, internal, best (First path received)
      **Community: 1111:1111**
      Last update: Tue Sep 26 16:10:09 2023

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-18 12:34:04 +08:00
Vivek
60dc4d2e89
[202305] Fix v6relay dual tor if selection issue (#17186)
Why I did it
dhcp_relay dual tor tests are failing in 202305

How I did it
Backport #15864 to 202305

How to verify it
Ran sonic-mgmt dhcp_relay tests.
2023-11-16 21:22:15 +08:00
Lawrence Lee
dc33611029 [tph]: Detect LAG flaps from APPL_DB (#16879)
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.

How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces

How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2023-11-16 20:49:42 +08:00
ganglv
f2a495f7e5
[202305] Share image for gnmi and telemetry (#17137)
Why I did it
Share docker image to support gnmi container and telemetry container
backport #16863

Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.

How to verify it
Run end to end test.
2023-11-15 11:28:21 +08:00
jcaiMR
0465d7fdf5 [dhcp-relay]: dhcp/dhcpv6 per interface counter support (#16377)
Why I did it
Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image.

Work item tracking
Microsoft ADO (17271822):

How I did it
- Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo
- Show CLI changes after counter format change

How to verify it
- Manually run show command
- dhcpmon, dhcprelay integration tests
2023-10-21 14:32:29 +08:00
Bob Chu
7af177b7b3 [Telemetry] enable default service config if no config from DB (#16683)
#### Why I did it
Fix issue #16533 , telemetry service exit in master and 202305 branches due to no telemetry configs in redis DB.

#### How I did it
Enable default config if no TELEMETRY configs from redis DB.

#### How to verify it
After the fix, telemetry service would work with the following two scenarios:
1. With TELEMETRY config in redis DB, load service configs from DB.
2. No TELEMETRY config in redis DB, use default service configs.
2023-10-21 12:32:37 +08:00
Stepan Blyshchak
eb1451301f [frr] fix default zebra config not inserted into empty zebra.conf (#16747)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-10-21 12:32:25 +08:00
Saikrishna Arcot
fb618b6e0b
[202305] Backport PRs to fix build (#16896, #16859, #16636) (#16934)
* Remove main deb installation for derived deb build (#16859)

* Don't install dependencies of derived debs

When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.

* Re-add openssh-client and openssh-sftp-server as derived debs

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 9ae77bc2dd)

* Re-add missing dependency for derived debs. (#16896)

* Re-add missing dependency for derived debs.

My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 963d40a77b)

* [build] Fix build issue in docker-ptf-sai caused by setuptools_scm new release (#16636)

docker-ptf-sai build fails on setuptools_scm's new release on 09/20/2023.
Use old version instead.

(cherry picked from commit bfa05c8349)

---------

Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
2023-10-20 15:39:54 +08:00
SuvarnaMeenakshi
2579b9506c
[202305][SNMP][IPv6]: Revert PRs to support SNMP over IPv6 (#16649)
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)"

This reverts commit ebe8c8c223.

* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15874)"

This reverts commit 83aa8b8180.
2023-10-09 09:47:44 +08:00
anamehra
3bca122b29 Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527)
Signed-off-by: anamehra anamehra@cisco.com

Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
2023-09-21 22:33:07 +08:00
mssonicbld
1726eb3eb7
Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388) (#16626)
* Change the CAK key length check in config plugin, macsec test profile changes

* Fix the format in add_profile api

The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier.

Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>
2023-09-21 20:39:01 +08:00
Zhaohui Sun
2bc65aa7ba
[202305]Change orchagent pop batch size from 8192 to 1024 (#16127)
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.

```
Aug  9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug  9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug  9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug  9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug  9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```

88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua
##### Work item tracking
- Microsoft ADO **24741990**:

#### How I did it
Change batch size from 8192 to 1024.

#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.

### Tested branch (Please provide the tested image version)

- [x] 20220531.36
2023-08-14 17:53:08 -07:00
abdosi
15a39ac806 Fix the Loopback0 IPv6 address of LC's in chassis not reachable from (#16026)
What I did:
Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer devices.

Why I did:
For Ipv6 Loopback0 address we only advertise /64 subnet to the peer devices. However, in case of chassis each LC will have it own /128 address of that /64 subnet . Since this /128 address does not get advertised peer devices can-not ping/reach the LC's loopback0.

How I fix:
Advertise /128 Loopback0 Ipv6 address only between i-BGP peers. This way even though /64 is advertised to e-BGP peer devices when packet reaches any of LC's it can reach the appropriate LC's.

How I verify:
Manual verification
UT added for same.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-08-15 04:32:34 +08:00
SuvarnaMeenakshi
ebe8c8c223 [SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487

The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
    "MGMT_INTERFACE": {
        "eth0|10.250.0.101/24": {
            "forced_mgmt_routes": [
                "172.17.0.1/24"
            ],
            "gwaddr": "10.250.0.1"
        },
        "eth0|fe80::5054:ff:fe6f:16f0/64": {
            "gwaddr": "fe80::1"
        }
    },

Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      274060/snmpd        
udp        0      0 10.1.0.32:161           0.0.0.0:*                           274060/snmpd        
udp        0      0 10.250.0.101:161        0.0.0.0:*                           274060/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                274060/snmpd        
udp6       0      0 fe80::5054:ff:fe6f::161 :::*                                274060/snmpd      -- Link local 
 
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.250.0.101  netmask 255.255.255.0  broadcast 10.250.0.255
        inet6 fe80::5054:ff:fe6f:16f0  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6f:16:f0  txqueuelen 1000  (Ethernet)
        RX packets 36384  bytes 22878123 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 261265  bytes 46585948 (44.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py  -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..                                                                        

snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED 
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2023-08-14 18:32:35 +08:00
mssonicbld
7bd67d4f37
Upgrade scapy in the PTF's python3 virtualenv to 2.5.0 (#15573) (#15875) 2023-07-19 20:05:40 +08:00
mssonicbld
83aa8b8180
[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15874) 2023-07-19 20:04:57 +08:00
mssonicbld
74598e568a
Add health check probe for k8s upgrade containers. (#15223) (#15867)
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.

more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.

#### Which release branch to backport (provide reason below if selected)

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211

#### Tested branch (Please provide the tested image version)
- [x] 20220531.28

Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
2023-07-19 16:11:13 +08:00
xumia
de2a650a8e
[Build] Fix the PyYang python package installation issue (#15892)
Why I did it
Fix the armhf build failure.
How to reproduce the issue:

docker run -it debain:bullseye bash
apt-get update && apt-get install -y python3-pip
pip3 install PyYAML==5.4.1
Error message:

Collecting PyYAML==5.4.1
  Downloading PyYAML-5.4.1.tar.gz (175 kB)
     |████████████████████████████████| 175 kB 12.3 MB/s 
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl
....
      raise AttributeError(attr)
  AttributeError: cython_sources
  ----------------------------------------
WARNING: Discarding d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1
ERROR: No matching distribution found for PyYAML==5.4.1
root@fa2fa92edcfd:/# 
But if adding the option --no-build-isolation, then it is good, see fix.

install "PyYAML==5.4.1" --no-build-isolation
The same error can be found in the multiple builds.

Work item tracking
Microsoft ADO (number only): 24567457
How I did it
Add a build option --no-build-isolation.

Disable isolation when building a modern source distribution. Build dependencies specified by PEP 518 must be already installed if this option is used.
How to verify it
2023-07-19 09:26:49 +08:00
lixiaoyuner
c59f55f6a3
Move k8s script to docker-config-engine (#14788) (#15768)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-17 23:21:01 +08:00
xumia
826f5a1d45 [Build] Fix the python module importlib.metadata not found issue (#15800)
Why I did it
It is to fix the docker-ptf-sai build failure.
https://dev.azure.com/mssonic/build/_build/results?buildId=311315&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795

2023-07-09T13:53:19.9025355Z �[91mTraceback (most recent call last):
2023-07-09T13:53:19.9025715Z   File "/root/ptf/.eggs/setuptools_scm-7.1.0-py3.7.egg/setuptools_scm/_entrypoints.py", line 74, in <module>
2023-07-09T13:53:19.9025933Z     from importlib.metadata import entry_points  # type: ignore
2023-07-09T13:53:19.9026167Z ModuleNotFoundError: No module named 'importlib.metadata'
Work item tracking
Microsoft ADO (number only): 24513583
How I did it
How to verify it
2023-07-13 20:57:24 +08:00
mssonicbld
2fc98cd8fc
[chassis][lldp] Fix the lldp error log in host instance which doesn't contain front panel ports (#14814) (#15603) 2023-06-29 21:46:32 +08:00
Longxiang Lyu
664675cad5 [mux] Integrate linkmgrd with swss logger (#15392)
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2023-06-26 16:40:58 +08:00
Hua Liu
05f1a5a31e
Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429)
Add watchdog mechanism to swss service and generate alert when swss have issue. 

**Work item tracking**
Microsoft ADO (number only): 16578912

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.

Manually test process_monitoring/test_critical_process_monitoring.py can pass.

Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.

Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-12 17:53:54 -07:00
Ye Jianquan
cec9d7b83a
Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686)" (#15390)
This reverts commit 44427a2f6b.
Docker image not updated during PR validation and caused PR check failures.
Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.
2023-06-09 09:10:35 +08:00
abdosi
6139c525d2
updated internal route policy for chassis-packet (#15349)
What I did:

Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state

Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops

For VOQ chassis there is no e-BGP peer (connected route via bgp )  resolution as route is added as Static route by orchagent over Ethernet-IB.

Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507

How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-06-07 09:17:44 -07:00
Hua Liu
44427a2f6b
Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686)
This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first.

**What I did**
Add orchagent watchdog to monitor and alert orchagent stuck issue.

**Why I did it**
Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it.

**How I verified it**
Pass all UT.
Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly.
Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log:

Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).

**Details if related**
Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737
UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306
2023-06-05 22:21:17 -07:00
Mai Bui
1477f779de
modify commands using utilities_common.cli.run_command and advance sonic-utilities submodule on master (#15193)
Dependency:
sonic-net/sonic-utilities#2718

Why I did it
This PR sonic-net/sonic-utilities#2718 reduce shell=True usage in utilities_common.cli.run_command() function.

Work item tracking
Microsoft ADO (number only): 15022050
How I did it
Replace strings commands using utilities_common.cli.run_command() function to list of strings

due to circular dependency, advance sonic-utilities submodule
72ca4848 (HEAD -> master, upstream/master, upstream/HEAD) Add CLI configuration options for teamd retry count feature (sonic-net/sonic-utilities#2642)
359dfc0c [Clock] Implement clock CLI (sonic-net/sonic-utilities#2793)
b316fc27 Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (sonic-net/sonic-utilities#2772)
dc59dbd2 Replace pickle by json (sonic-net/sonic-utilities#2849)
a66f41c4 [show] replace shell=True, replace xml by lxml, replace exit by sys.exit (sonic-net/sonic-utilities#2666)
57500572 [utilities_common] replace shell=True (sonic-net/sonic-utilities#2718)
6e0ee3e7 [CRM][DASH] Extend CRM utility to support DASH resources. (sonic-net/sonic-utilities#2800)
b2c29b0b [config] Generate sysinfo in single asic (sonic-net/sonic-utilities#2856)
2023-06-05 17:08:13 +08:00
Tejaswini Chadaga
8058550c09
[bgp]: Add sudo check for TSA/B/C command execution (#15288)
TSA/B/C scripts invoke commands that require root permissions. If the user does not have sudo permissions, the scripts today execute until the command and throw a backtrace with error at the specific command. Added a check to ensure the operations check for root permissions upfront.
2023-06-03 23:47:26 +02:00
Baorong Liu
acb423b255
[staticroutebfd]fix an issue on deleting a non-bfd static route (#15269)
* [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route

Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
2023-06-02 11:46:56 -07:00
qiwang4
359b80e012
[master]staticroutebfd process implementation (#13789)
* [BFD] staticroutebfd implementation
* To enable the BFD for static route

HLD: sonic-net/SONiC#1216
2023-05-26 16:32:05 -07:00
Sachin Holla
ba6aba2b92
[mgmt-framework] Fix rest-server startup script (#14979)
This script was using 'null' as default value for all optional fields
of REST_SERVER table -- due to incorrect use of 'jq -r' command.
Server was not coming up when REST_SERVER entry exists but some fields
were not given (which is a valid configuration).
Fixed the jq query expression to return empty string for non existing
fields.

Signed-off-by: Sachin Holla <sachin.holla@broadcom.com>
2023-05-22 17:42:38 -07:00
Tejaswini Chadaga
4e60f0d563
Template change for BGP monitors on T2 (#14844)
Why I did it
To support BGPMon sessions from each T2 linecard ASIC

Work item tracking
Microsoft ADO (number only): 17873174
How I did it
Added change in BGPMon configuration to use Loopback4096 as source interface, since this has a unique IP per ASIC.

How to verify it
Tested by manually setting up BGPMon session on T2 LC and verified that Loopback4096 could be used as source
2023-05-09 13:40:00 -07:00
abdosi
9b8b4e6e4d
[bgp/TSA]: Fixed the internal peer route-map policy (#14804)
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.

How I verify:

Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.

Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-05-05 13:55:05 -07:00
Zain Budhwani
4974b5c49c
Add idle conn duration config to telemetry.sh (#14903)
Why I did it
Supports new field in sonic-net/sonic-gnmi@258b887

Work item tracking
Microsoft ADO (number only): 13468195
How I did it
Add new field in telemetry.sh

How to verify it
Pipeline
2023-05-04 16:47:02 -07:00
Tejaswini Chadaga
ca224863cb
Changes to support TSA from supervisor (#14691)
Why I did it
Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module

Work item tracking
Microsoft ADO (number only): 17826134
How I did it
When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards

TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this)

Changes also include no-stats option with TSC for quick retrieval of the current system isolation state

This PR also reverts changes in #11403

How to verify it
These changes have a dependency on sonic-net/sonic-utilities#2701 for testing

Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard
Verify that all routes are withdrawn from eBGP neighbors on all linecards
Run TSB from supervisor module and ensure transition to Normal mode on each linecard
Verify that all routes are re-advertised from eBGP neighbors on all linecards
Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards
2023-04-28 16:28:06 +08:00