Commit Graph

1021 Commits

Author SHA1 Message Date
abdosi
c9111122e4 [chassis/multi-asic] Make sure iBGP session established as directly connected (#16777)
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.

Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example

- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default  where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.

Above scenario can result in packet mis-forwarding on data plane

How I fixed it:-

To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature

neighbor PEER ttl-security hops NUMBER

This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.

We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.

How I verify:

Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-10-25 12:32:27 +08:00
SuvarnaMeenakshi
427a3325d1
[202205][SNMP][IPv6]: Revert PRs to support SNMP over IPv6 (#16650)
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013) (#16102)"

This reverts commit 628e1ad981.

* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15826)"

This reverts commit 7cfb71bc18.
2023-10-11 12:02:33 -07:00
mssonicbld
4a75f1be0a
[chassis/multi-asic] Enable Sending BGP Community over internal neighbors over iBGP Session (#16705) (#16710) 2023-09-27 11:01:45 +08:00
anamehra
561c71de43 Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527)
Signed-off-by: anamehra anamehra@cisco.com

Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
2023-09-14 09:29:06 +08:00
mssonicbld
32f23dd786
Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388) (#16499) 2023-09-09 06:23:49 +08:00
Zhaohui Sun
a098b92591
[202205]Change orchagent pop batch size from 8192 to 1024 (#16126)
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.

```
Aug  9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug  9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug  9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug  9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug  9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```

88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua

##### Work item tracking
- Microsoft ADO **24741990**:

#### How I did it
Change batch size from 8192 to 1024.

#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.

### Tested branch (Please provide the tested image version)

- [x] 20220531.36
2023-08-14 17:54:39 -07:00
mssonicbld
628e1ad981
[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013) (#16102)
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487

The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
    "MGMT_INTERFACE": {
        "eth0|10.250.0.101/24": {
            "forced_mgmt_routes": [
                "172.17.0.1/24"
            ],
            "gwaddr": "10.250.0.1"
        },
        "eth0|fe80::5054:ff:fe6f:16f0/64": {
            "gwaddr": "fe80::1"
        }
    },

Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      274060/snmpd        
udp        0      0 10.1.0.32:161           0.0.0.0:*                           274060/snmpd        
udp        0      0 10.250.0.101:161        0.0.0.0:*                           274060/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                274060/snmpd        
udp6       0      0 fe80::5054:ff:fe6f::161 :::*                                274060/snmpd      -- Link local 
 
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.250.0.101  netmask 255.255.255.0  broadcast 10.250.0.255
        inet6 fe80::5054:ff:fe6f:16f0  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6f:16:f0  txqueuelen 1000  (Ethernet)
        RX packets 36384  bytes 22878123 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 261265  bytes 46585948 (44.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py  -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..                                                                        

snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED 
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)

Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
2023-08-11 08:40:50 -07:00
jcaiMR
07388fd1fa
advance dhcprelay to 6a6ce24, add default dhcpv6 dualtor source interface (#15881)
sonic-build image side change to fix source interface selection in dual tor scenario.
dhcprelay related PR:
[master]fix dhcpv6 relay dual tor source interface selection issue sonic-dhcp-relay#42

Announce dhcprelay submodule to 6a6ce24 to include PR #42
2023-07-19 14:08:48 -07:00
xumia
d0648f81c7
[Build] Fix the PyYang python package installation issue (#15890) (#15907)
Why I did it
Fix the armhf build failure.
How to reproduce the issue:

docker run -it debain:bullseye bash
apt-get update && apt-get install -y python3-pip
pip3 install PyYAML==5.4.1
Error message:

Collecting PyYAML==5.4.1
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl
....
      raise AttributeError(attr)
  AttributeError: cython_sources
  ----------------------------------------
WARNING: Discarding d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1
ERROR: No matching distribution found for PyYAML==5.4.1
root@fa2fa92edcfd:/#
But if adding the option --no-build-isolation, then it is good, see fix.

install "PyYAML==5.4.1" --no-build-isolation
The same error can be found in the multiple builds.

Work item tracking
Microsoft ADO (number only): 24567457

How I did it
Add a build option --no-build-isolation.
2023-07-19 08:08:35 -07:00
mssonicbld
7cfb71bc18
[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15826)
Modify snmpd.conf to start snmpd to listen on specific management and loopback ips instead of listening on any ip.

#### Why I did it
SNMP over IPv6 is not working for all scenarios for a single asic platforms.
The expectation is that SNMP query over IPv6 should work over Management or Loopback0 addresses.
**Specific scenario where this issue is seen**
In case of Lab T0 device,  when SNMP request is sent from a directly connected T1 neighbor over Loopback IP, SNMP response was not received.
This was because the SRC IP address in SNMP response was not Loopback IP, it was the PortChannel IP connected to the neighboring device.
```
23:18:51.620897  In 22:26:27:e6:e0:07 ethertype IPv6 (0x86dd), length 105: fc00::72.41725 > **fc00:1::32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:18:51.621441 Out 28:99:3a:a0:97:30 ethertype IPv6 (0x86dd), length 241: **fc00::71**.161 > fc00::72.41725:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```
In case of IPv4, the SRC IP in SNMP response was correctly set to Loopback IP.
```
23:25:32.769712  In 22:26:27:e6:e0:07 ethertype IPv4 (0x0800), length 85: 10.0.0.57.56701 > **10.1.0.32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:25:32.975967 Out 28:99:3a:a0:97:30 ethertype IPv4 (0x0800), length 221: **10.1.0.32**.161 > 10.0.0.57.56701:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```

**Sequence of SNMP request and response**
1. SNMP request will be sent with SRC IP fc00::72 DST IP fc00:1::32
2. SNMP request is received at SONiC device is sent to snmpd which is listening on port 161 :::161/
3. snmpd process will parse the request create a response and sent to DST IP fc00::72. 
snmpd process does not track the DST IP on which the SNMP request was received, which in this case is Loopback IP.
snmpd process will only keep track what is tht IP to which the response should be sent to.
4. snmpd process will send the response packet.
5. Kernel will do a route look up on destination IP and find the best path.
ip -6 route get fc00::72
fc00::72 from :: dev PortChannel101 proto kernel src fc00::71 metric 256 pref medium
5. Using the "src" ip from about, the response is sent out. This SRC ip is that of the PortChannel and not the device Loopback IP.

The same issue is seen when SNMP query is sent from a remote server over Management IP.
SONiC device eth0 --------- Remote server
SNMP request comes with SRC IP <Remote_server> DST IP <Mgmt IP>
If kernel finds best route to Remote_server_IP is via BGP neighbors, then it will send the response via front-panel interface with SRC IP as Loopback IP instead of Management IP.

Main issue is that in case of IPv6, snmpd ignores the IP address to which SNMP request was sent, in case of IPv6.
In case of IPv4, snmpd keeps track of DST IP of SNMP request, it will keep track if the SNMP request was sent to mgmt IP or Loopback IP.
Later, this IP is used in ipi_spec_dst as SRC IP which helps kernel to find the route based on DST IP using the right SRC IP.
https://github.com/net-snmp/net-snmp/blob/master/snmplib/transports/snmpUDPBaseDomain.c#L300 
ipi.ipi_spec_dst.s_addr = srcip->s_addr
Reference: https://man7.org/linux/man-pages/man7/ip.7.html
```
If IP_PKTINFO is passed to sendmsg(2)
              and ipi_spec_dst is not zero, then it is used as the local
              source address for the routing table lookup and for
              setting up IP source route options.  When ipi_ifindex is
              not zero, the primary local address of the interface
              specified by the index overwrites ipi_spec_dst for the
              routing table lookup.
```

**This issue is not seen on multi-asic platform, why?**
on multi-asic platform, there exists different network namespaces.
SNMP docker with snmpd process runs on host namespace.
Management interface belongs to host namespace.
Loopback0 is configured on asic namespaces.
Additional inforamtion on how the packet coming over Loopback IP reaches snmpd process running on host namespace: https://github.com/sonic-net/sonic-buildimage/pull/5420
Because of this separation of network namespaces, the route lookup of destination IP is confined to routing table of specific namespace where packet is received.
if packet is received over management interface, SNMP response also is sent out of management interface. Same goes with packet received over Loopback Ip.

##### Work item tracking
- Microsoft ADO **17537063**:

#### How I did it
Have snmpd listen on specific Management and Loopback IPs specifically instead of listening on any IP for single-asic platform.

Before Fix
```
admin@xx:~$ sudo netstat -tulnp | grep 161   
udp        0      0 0.0.0.0:161             0.0.0.0:*                           15631/snmpd         
udp6       0      0 :::161                  :::*                                15631/snmpd  
```
After fix
```
admin@device:~$ sudo netstat -tulnp | grep 161
udp        0      0 10.1.0.32:161           0.0.0.0:*                           215899/snmpd        
udp        0      0 10.3.1.1:161             0.0.0.0:*                           215899/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                215899/snmpd        
udp6       0      0 fc00:2::32:161          :::*                                215899/snmpd  
``` 

**How this change helps with the issue?**
To see snmpd trace logs, modify snmpd to start using the below parameters, in supervisord.conf file
```
/usr/sbin/snmpd -f -LS0-7i -Lf /var/log/snmpd.log
```
When snmpd listens on any IP, snmpd binds to IPv4 and IPv6 sockets as below:
```
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[0.0.0.0]:161
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 8 to UDP/IPv6: [::]:161
```

When IPv4 response is sent, it goes out of fd 7 and IPv6 response goes out of fd 8.
When IPv6 response is sent, it does not have the right SRC IP and it can lead to the issue described.

When snmpd listens on specific Loopback/Management IPs, snmpd binds to different sockets:
```
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[10.250.0.101]:161
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 8 to UDP: [0.0.0.0]:0->[10.1.0.32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 8
netsnmp_udpbase: binding socket: 10 to UDP/IPv6: [fc00:1::32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 10
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x7fffed4c85d0, len = 28
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 9 to UDP/IPv6: [fc00:2::32]:161
```
When SNMP request comes in via Loopback IPv4, SNMP response is sent out of fd 8
```
trace: netsnmp_udpbase_send(): transports/snmpUDPBaseDomain.c, 511:
netsnmp_udp: send 170 bytes from 0x5581f2fbe30a to UDP: [10.0.0.33]:46089->[10.1.0.32]:161 on fd 8
```
When SNMP request comes in via Loopback IPv6, SNMP response is sent out of fd 10
```
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x5581f2fc2ff0, len = 28
trace: netsnmp_udp6_send(): transports/snmpUDPIPv6Domain.c, 164:
netsnmp_udp6: send 170 bytes from 0x5581f2fbe30a to UDP/IPv6: [fc00::42]:43750 on fd 10
```

#### How to verify it
Verified on single asic and multi-asic devices.
Single asic SNMP query with Loopback 
```
ARISTA01T1#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
ARISTA01T1#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xxx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
```

On multi-asic -- no change.
```
sudo netstat -tulnp | grep 161
udp        0      0 0.0.0.0:161             0.0.0.0:*                           17978/snmpd         
udp6       0      0 :::161                  :::*                                17978/snmpd 
```
Query result using Loopback IP from a directly connected BGP neighbor
```
ARISTA01T2#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64
ARISTA01T2#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64  
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
2023-07-14 10:16:50 -07:00
lixiaoyuner
fb14b987b4 Add health check probe for k8s upgrade containers. (#15223)
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.

more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.

#### Which release branch to backport (provide reason below if selected)

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211

#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
2023-07-14 04:32:45 +08:00
lixiaoyuner
6922edba80
Move k8s script to docker-config-engine (#14788) (#15740)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-07 09:22:58 -07:00
mssonicbld
3f59abd29b
[chassis][lldp] Fix the lldp error log in host instance which doesn't contain front panel ports (#14814) (#15669)
* [chassis][lldp] Fix the lldp error log in host instance which doesn't contain front pannel ports

---------

Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
2023-06-30 13:24:26 -07:00
mssonicbld
d935f430b8
[mux] Integrate linkmgrd with swss logger (#15392) (#15539) 2023-06-20 09:08:51 +08:00
mssonicbld
ffd062afae
updated internal route policy for chassis-packet (#15349) (#15378) 2023-06-08 03:40:23 +08:00
abdosi
4f0f0e0927 [bgp/TSA]: Fixed the internal peer route-map policy (#14804)
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.

How I verify:

Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.

Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-05-18 12:32:38 +08:00
judyjoseph
4b30d09903 [macsec]: show macsec: add --profile option, include profile name in show command output (#13940)
This PR is to add the following

Add a new options "--profile" to the show macsec command, to show all profiles in device
Update the currentl show macsec command, to show profile in each interface o/p. This will tell which macsec profile the interface is attached to.
2023-05-18 09:46:53 +08:00
mssonicbld
96affcc0df
Add monit_snmp file to monitor memory usage (#14464) (#14730) 2023-04-20 05:21:17 +08:00
Gokulnath-Raja
60f9602095
[sflow] Exception handling for if_nametoindex (#11437) (#14456)
catch system error and log as warning level instead of
     error level in case interface was already deleted

Signed-off-by: Gokulnath-Raja <Gokulnath_R@dell.com>
2023-04-18 11:57:44 -07:00
mssonicbld
6dce9c4ac6
Fix telemetry.sh passing in null as log level value (#14303) (#14410) 2023-03-25 05:24:23 +08:00
Saikrishna Arcot
932d0f5391
[202205] Remove apt package lists and make macro to clean up apt and python cache (#14377)
* Remove apt package lists and make macro to clean up apt and python cache

Remove the apt package lists (`/var/lib/apt/lists`) from the docker
containers. This saves about 100MB.

Also, make a macro to clean up the apt and python cache that can then be
used in all of the containers. This helps make the cleanup be consistent
across all containers.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-03-22 14:51:25 -07:00
mssonicbld
929c1849fa
[lldpmgrd] Don't log error message for outdated event (#14178) (#14274) 2023-03-17 04:52:04 +08:00
xumia
7bba702f1e [Build] Fix the mirror gpg key expired issue (#14206)
Why I did it
[Build] Fix the mirror gpg key expired issue
See vs build: https://dev.azure.com/mssonic/build/_build/results?buildId=231680&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795

How I did it
Add the apt option not to check the valid until, the option is set to the SONiC docker base image, docker ptf missing the option.

Acquire::Check-Valid-Until "false";
How to verify it
The build of docker-ptf is succeeded after fixed.

2023-03-11T17:26:35.1801999Z [ building ] [ target/docker-ptf.gz ] 
2023-03-11T17:38:10.1608536Z [ finished ] [ target/docker-ptf.gz ]
2023-03-16 04:32:36 +08:00
Yaqiang Zhu
9d2ec1dc56 [dhcp-relay] Add dhcp_relay show cli (#13614)
Why I did it
Currently the show and clear cli of dhcp_relayis may cause confusion.

How I did it
Add doc for it: [doc] Add docs for dhcp_relay show/clear cli sonic-utilities#2649
Add dhcp_relay config cli and test cases.
show dhcp_relay ipv4 helper
show dhcp_relay ipv6 destination
show dhcp_relay ipv6 counters
sonic-clear dhcp_relay ipv6 counters

How to verify it
Unit test all passed
2023-03-07 10:12:59 +08:00
Zain Budhwani
a1f9ee5684 Remove dialout as critical process (#14006)
#### Why I did it

Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely

#### How I did it

Remove from critical process list
2023-03-06 18:37:19 +08:00
mssonicbld
ecb528face
Fix VOQ_CHASSIS_V6_PEER route-map config (#14055) (#14074) 2023-03-04 05:08:02 +08:00
Ashwin Srinivasan
57139110ca Added libpci and pciutils to the pmon docker (#12684)
This enables the pcied daemon to call the corresponding system commands needed for pci transactions
2023-02-22 06:34:40 +08:00
xumia
ccfb13aa2a [Build] Support j2 template for debian sources for docker ptf (#13198)
Change to use the sources.list from the file generated from the j2 template
2023-02-22 04:33:49 +08:00
Yaqiang Zhu
928aad1944 [dhcp_relay] Remove exist check while adding dhcpv6 relay (#13822)
Why I did it
DHCPv6 relay config entry is not useful while del dhcpv6 relay config.

How I did it
Remove dhcpv6_relay entry if it is empty and not check entry exist while adding dhcpv6 relay
2023-02-17 20:53:42 +08:00
mssonicbld
31297dcbdb
[dhcp-relay] Add support for dhcp_relay config cli (#13373) (#13584) 2023-02-02 06:30:53 +08:00
mssonicbld
74a7d2d751
[chassis] Fixed critical process not correct for database-chassis docker (#13445) (#13486) 2023-01-24 04:35:55 +08:00
xumia
8395de69d3
[Build] Support j2 template for debian sources (#12557) (#13185)
Why I did it
Unify the Debian mirror sources
Make easy to upgrade to the next Debian release, not source url code change required. Support to customize the Debian mirror sources during the build
Relative issue: #12523

How I did it
How to verify it
2022-12-30 09:47:33 +08:00
Sudharsan Dhamal Gopalarathnam
c83fa0a7fe
[202205][dhcp_relay]Fix the clear dhcp6relay_counters CLI (#13148) (#13181)
Avoid traceback on sonic-clear command

sonic-clear dhcp6relay_counters 
Traceback (most recent call last):
  File "/usr/local/bin/sonic-clear", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/clear/plugins/dhcp-relay.py", line 19, in dhcp6relay_clear_counters
    counter = DHCPv6_Counter()
NameError: name 'DHCPv6_Counter' is not defined

- How I did it
Corrected the way to import using importlib

- How to verify it
Tested the sonic-clear command and verified no traceback is seen
2022-12-28 14:50:03 +02:00
Longxiang Lyu
ac5cb4acd9
[dualtor] Let T0 delay 10 seconds before sending BGP updates (#13082)
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.

How I did it
add coalesce-time 10000 to the frr bgp startup config.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2022-12-16 09:57:04 -08:00
mssonicbld
557283a7f8
Fix dependency of dhcp-mon on VLAN with only v6 (#13006) (#13029) 2022-12-13 08:12:33 +08:00
Arvindsrinivasan Lakshmi Narasimhan
f1b7b68a52
[202205][chassis]fix to use the correct table in chassis_state_db (#12992)
Why I did it
In the PR sonic-net/sonic-platform-daemons#311 the table for updating the fabric asic was changed. This PR is update docker-init.sh to use the correct table to detect the fabric asic.

How I did it
update docker-init.sh

How to verify it
Check on chassis


Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2022-12-08 11:36:45 -08:00
Arnaud
a51264d24b [docker-fpm-frr]: Add unified-split mode to routing config (#11938)
- Why I did it
The values for config_db "docker_routing_config_mode" are:

separated: FRR config generated from ConfigDB, each FRR daemon has its own config file
unified: FRR config generated from ConfigDB, single FRR config file
split: FRR config not generated from ConfigDB, each FRR daemon has its own config file
This commit adds:
split-unified: FRR config not generated from ConfigDB, single FRR config file

- How I did it
In docker_init.sh, when split-unified is used, the FRR configs are not generated
from ConfigDB. What's more, "service integrated-vtysh-config" is configured in vtysh.conf.

- How to verify it
FRR config not overwritten when FRR container starts.

Signed-off-by: Arnaud le Taillanter <a.letaillanter@criteo.com>
2022-11-28 18:19:53 +00:00
arlakshm
b86b3b0d7d
[202205][chassis] update the asic_status.py to read from CHASSIS_FABRIC_ASIC_INFO_TABLE (#12780)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2022-11-26 20:27:30 -08:00
Richard.Yu
20fedefbde
[sai-ptf]Fix sai ptf issue (#12723)
* fix sai-ptf build issue caused by upgraded scapy in base ptf docker

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>

* remove useless change

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>

* reformat code

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2022-11-16 20:28:30 -08:00
tjchadaga
a68c6ea454 Allow TSA on ibgp sessions between linecards on packet chassis (#12589) 2022-11-10 18:13:04 +00:00
arlakshm
12785e4721 update notify-keyspace-events in redis.conf (#12540)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

Why I did it
closes #12343

Today in SONiC the notify-keyspace-events is from DbInterface class when application try do any configdb set.
In Chassis the chassis_db may not get any configdb set operations, so there is chance this configuration will never be set.
So the chassis_db updates from one line card will not be propogated to other linecards, which are doing a psubscribe to get these event.

How I did it
update the redis.conf to set notify-keyspace-events AKE so that the notify-keyspace-events are set when the redis instance is started

How to verify it
Test on chassis
2022-11-10 18:09:58 +00:00
xumia
247c8dd999 [Build] Add the missing debian source bullseye-updates/buster-updates (#12522)
Why I did it
Add the missing debian source bullseye-updates/buster-updates

The build failure as below, it is caused by the docker image debian:bullseye used the version 2.31-13+deb11u5, but the version only available in bullseye-update.
2022-10-31 15:20:21 +08:00
Adam Yeung
31811db9ae iccpd bullseye migration (#12097) 2022-10-27 22:12:23 +00:00
kellyyeh
c001f2b916 Add dhcp6relay dualtor option (#12459) 2022-10-25 20:47:10 +00:00
Lawrence Lee
d9768be475 [tunnel_pkt_handler]: Skip nonexistent intfs (#12424)
- Skip the interface status check if the interface does not exist. In the future, when the interface is created/comes up this check will be triggered again.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-10-25 20:44:08 +00:00
cytsao1
8930d70972 [pmon] Add smartmontools to pmon docker (#11837)
* Add smartmontools to pmon docker

* Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files

* Add comments on smartmontools version for both host and pmon
2022-10-25 20:41:26 +00:00
Vivek
c71c63b420 [DHCP_RELAY] Updated wait_for_intf.sh to wait for ipv6 global and link local addr (#12273)
- Why I did it
Fixes #11431

- How I did it
dhcp6relay binds to ipv6 addresses configured on these vlan interfaces
Thus check if they are ready before launching dhcp6relay

- How to verify it
Unit Tests
Tested on a live device

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2022-10-25 20:38:08 +00:00
Zhaohui Sun
21b2361027 Enable system-site-packages for ptf docker and install thrift for test_qos_sai (#12094)
Why I did it
test_sai_qos failed because of the following error:

"stderr_lines": [
        "Traceback (most recent call last):", 
        "  File \"/usr/bin/ptf\", line 522, in <module>", 
        "    test_modules = load_test_modules(config)", 
        "  File \"/usr/bin/ptf\", line 413, in load_test_modules", 
        "    mod = imp.load_module(modname, *imp.find_module(modname, [root]))", 
        "  File \"saitests/switch.py\", line 19, in <module>", 
        "    import switch_sai_thrift", 
        "ImportError: No module named switch_sai_thrift"
    ], 

It's because test_sai_qos runs ptf script which imports switch_sai_thrift, switch_sai_thrift is installed from python-saithrift_0.9.4_amd64.deb.
For master image, the deb file is for python3, but ptf only has virtual python3 environment, that's why we add --system-site-packages to allow virtual env to access system site-packeges.

Add thrift package in docker ptf virtual python3 env, because currently env-python3 doesn't have thrift module which is needed in switch_sai_thrift.

How I did it
Enable --system-site-packages for virtual py3 env in ptf docker and install thrift for test_qos_sai

How to verify it
load and login ptf conatiner
dpkg - i python-saithrift_0.9.4_amd64.deb
source /root/env-python3/bin/activate
python
import switch_sai_thrift.switch_sai_rpc
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
2022-10-03 19:00:22 +00:00
anamehra
ab0edff4cb Fix radv.conf traceback when VLAN_INTERFACE is not defined (#12034)
*Fix the if block scope to prevent traceback due to undefined vlan_list when  VLAN_INTERFACE is not defined.
2022-09-21 21:18:45 +00:00
Hua Liu
e51368d789
Revert "Fix docker database flush_unused_database failed issue (#11600) (#11677)" (#12084)
This reverts commit 7e4883e71f.
2022-09-15 14:19:56 -07:00