sonic-buildimage

Author	SHA1	Message	Date
ganglv	f2a495f7e5	[202305] Share image for gnmi and telemetry (#17137 ) Why I did it Share docker image to support gnmi container and telemetry container backport #16863 Work item tracking Microsoft ADO 25423918: How I did it Create telemetry image from gnmi docker image. Enable gnmi container and disable telemetry container by default. How to verify it Run end to end test.	2023-11-15 11:28:21 +08:00
jcaiMR	0465d7fdf5	[dhcp-relay]: dhcp/dhcpv6 per interface counter support (#16377 ) Why I did it Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image. Work item tracking Microsoft ADO (17271822): How I did it - Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo - Show CLI changes after counter format change How to verify it - Manually run show command - dhcpmon, dhcprelay integration tests	2023-10-21 14:32:29 +08:00
Bob Chu	7af177b7b3	[Telemetry] enable default service config if no config from DB (#16683 ) #### Why I did it Fix issue #16533 , telemetry service exit in master and 202305 branches due to no telemetry configs in redis DB. #### How I did it Enable default config if no TELEMETRY configs from redis DB. #### How to verify it After the fix, telemetry service would work with the following two scenarios: 1. With TELEMETRY config in redis DB, load service configs from DB. 2. No TELEMETRY config in redis DB, use default service configs.	2023-10-21 12:32:37 +08:00
Stepan Blyshchak	eb1451301f	[frr] fix default zebra config not inserted into empty zebra.conf (#16747 ) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-10-21 12:32:25 +08:00
Saikrishna Arcot	fb618b6e0b	[202305] Backport PRs to fix build (#16896 , #16859 , #16636 ) (#16934 ) * Remove main deb installation for derived deb build (#16859) * Don't install dependencies of derived debs When "building" a derived deb package, don't install the dependencies of the package into the container. It's not needed at this stage. * Re-add openssh-client and openssh-sftp-server as derived debs Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> --------- Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> (cherry picked from commit `9ae77bc2dd`) * Re-add missing dependency for derived debs. (#16896) * Re-add missing dependency for derived debs. My previous changed removed the whole dependency on the main deb existing, not just the installation of the main deb. Fix this by readding a dependency on the main deb being built/pulled from cache. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> --------- Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> (cherry picked from commit `963d40a77b`) * [build] Fix build issue in docker-ptf-sai caused by setuptools_scm new release (#16636) docker-ptf-sai build fails on setuptools_scm's new release on 09/20/2023. Use old version instead. (cherry picked from commit `bfa05c8349`) --------- Co-authored-by: Liu Shilong <shilongliu@microsoft.com>	2023-10-20 15:39:54 +08:00
SuvarnaMeenakshi	2579b9506c	[202305][SNMP][IPv6]: Revert PRs to support SNMP over IPv6 (#16649 ) * Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)" This reverts commit `ebe8c8c223`. * Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15874)" This reverts commit `83aa8b8180`.	2023-10-09 09:47:44 +08:00
anamehra	3bca122b29	Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527 ) Signed-off-by: anamehra anamehra@cisco.com Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.	2023-09-21 22:33:07 +08:00
mssonicbld	1726eb3eb7	Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388 ) (#16626 ) * Change the CAK key length check in config plugin, macsec test profile changes * Fix the format in add_profile api The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier. Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>	2023-09-21 20:39:01 +08:00
Zhaohui Sun	2bc65aa7ba	[202305]Change orchagent pop batch size from 8192 to 1024 (#16127 ) ### Why I did it Background running lua script may cause redis-server quite busy if batch size is 8192. If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash. ``` Aug 9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757 Aug 9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of ' Aug 9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error' Aug 9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. Aug 9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: 2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error Aug 9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: 2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error ``` 88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua ##### Work item tracking - Microsoft ADO 24741990: #### How I did it Change batch size from 8192 to 1024. #### How to verify it Run all test cases in sonic-mgmt to verify the system stability. ### Tested branch (Please provide the tested image version) - [x] 20220531.36	2023-08-14 17:53:08 -07:00
abdosi	15a39ac806	Fix the Loopback0 IPv6 address of LC's in chassis not reachable from (#16026 ) What I did: Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer devices. Why I did: For Ipv6 Loopback0 address we only advertise /64 subnet to the peer devices. However, in case of chassis each LC will have it own /128 address of that /64 subnet . Since this /128 address does not get advertised peer devices can-not ping/reach the LC's loopback0. How I fix: Advertise /128 Loopback0 Ipv6 address only between i-BGP peers. This way even though /64 is advertised to e-BGP peer devices when packet reaches any of LC's it can reach the appropriate LC's. How I verify: Manual verification UT added for same. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-08-15 04:32:34 +08:00
SuvarnaMeenakshi	ebe8c8c223	[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013 ) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md Make sure all your commits include a signature generated with `git commit -s` If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001 Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487 The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file. With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error: ``` Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161" Server Exiting with code 1 ``` From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id> Reference: https://datatracker.ietf.org/doc/html/rfc4007 ##### Work item tracking - Microsoft ADO (number only): #### How I did it Modify snmpd.conf file to use the %zone_id representation for ipv6 address. #### How to verify it In VS testbed, modify config_db to use link local ipv6 address as management address: "MGMT_INTERFACE": { "eth0\|10.250.0.101/24": { "forced_mgmt_routes": [ "172.17.0.1/24" ], "gwaddr": "10.250.0.1" }, "eth0\|fe80::5054:ff:fe6f:16f0/64": { "gwaddr": "fe80::1" } }, Execute config_reload after the above change. snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses: ``` admin@vlab-01:~$ sudo netstat -tulnp \| grep 161 tcp 0 0 127.0.0.1:3161 0.0.0.0:* LISTEN 274060/snmpd udp 0 0 10.1.0.32:161 0.0.0.0:* 274060/snmpd udp 0 0 10.250.0.101:161 0.0.0.0:* 274060/snmpd udp6 0 0 fc00:1::32:161 :::* 274060/snmpd udp6 0 0 fe80::5054:ff:fe6f::161 :::* 274060/snmpd -- Link local admin@vlab-01:~$ sudo ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.250.0.101 netmask 255.255.255.0 broadcast 10.250.0.255 inet6 fe80::5054:ff:fe6f:16f0 prefixlen 64 scopeid 0x20<link> ether 52:54:00:6f:16:f0 txqueuelen 1000 (Ethernet) RX packets 36384 bytes 22878123 (21.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 261265 bytes 46585948 (44.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0 iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64" ``` Logs from snmpd: ``` Turning on AgentX master support. NET-SNMP version 5.9 Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308 ``` Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works: ``` ./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u === Running tests in groups === Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer .. snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED ``` <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [x] 202012 - [x] 202106 - [x] 202111 - [x] 202205 - [x] 202211 - [x] 202305 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)	2023-08-14 18:32:35 +08:00
mssonicbld	7bd67d4f37	Upgrade scapy in the PTF's python3 virtualenv to 2.5.0 (#15573 ) (#15875 )	2023-07-19 20:05:40 +08:00
mssonicbld	83aa8b8180	[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487 ) (#15874 )	2023-07-19 20:04:57 +08:00
mssonicbld	74598e568a	Add health check probe for k8s upgrade containers. (#15223 ) (#15867 ) #### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO (number only): 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28 Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>	2023-07-19 16:11:13 +08:00
xumia	de2a650a8e	[Build] Fix the PyYang python package installation issue (#15892 ) Why I did it Fix the armhf build failure. How to reproduce the issue: docker run -it debain:bullseye bash apt-get update && apt-get install -y python3-pip pip3 install PyYAML==5.4.1 Error message: Collecting PyYAML==5.4.1 Downloading PyYAML-5.4.1.tar.gz (175 kB) \|████████████████████████████████\| 175 kB 12.3 MB/s Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl .... raise AttributeError(attr) AttributeError: cython_sources ---------------------------------------- WARNING: Discarding `d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)`=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0., !=3.1., !=3.2., !=3.3., !=3.4., !=3.5.). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output. ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1 ERROR: No matching distribution found for PyYAML==5.4.1 root@fa2fa92edcfd:/# But if adding the option --no-build-isolation, then it is good, see fix. install "PyYAML==5.4.1" --no-build-isolation The same error can be found in the multiple builds. Work item tracking Microsoft ADO (number only): 24567457 How I did it Add a build option --no-build-isolation. Disable isolation when building a modern source distribution. Build dependencies specified by PEP 518 must be already installed if this option is used. How to verify it	2023-07-19 09:26:49 +08:00
lixiaoyuner	c59f55f6a3	Move k8s script to docker-config-engine (#14788 ) (#15768 ) Why I did it To reduce the container's dependency from host system Work item tracking Microsoft ADO (number only): 17713469 How I did it Move the k8s container startup script to config engine container, other than mount it from host. How to verify it Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container. Signed-off-by: Yun Li <yunli1@microsoft.com> Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>	2023-07-17 23:21:01 +08:00
xumia	826f5a1d45	[Build] Fix the python module importlib.metadata not found issue (#15800 ) Why I did it It is to fix the docker-ptf-sai build failure. https://dev.azure.com/mssonic/build/_build/results?buildId=311315&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795 2023-07-09T13:53:19.9025355Z �[91mTraceback (most recent call last): 2023-07-09T13:53:19.9025715Z File "/root/ptf/.eggs/setuptools_scm-7.1.0-py3.7.egg/setuptools_scm/_entrypoints.py", line 74, in <module> 2023-07-09T13:53:19.9025933Z from importlib.metadata import entry_points # type: ignore 2023-07-09T13:53:19.9026167Z ModuleNotFoundError: No module named 'importlib.metadata' Work item tracking Microsoft ADO (number only): 24513583 How I did it How to verify it	2023-07-13 20:57:24 +08:00
mssonicbld	2fc98cd8fc	[chassis][lldp] Fix the lldp error log in host instance which doesn't contain front panel ports (#14814 ) (#15603 )	2023-06-29 21:46:32 +08:00
Longxiang Lyu	664675cad5	[mux] Integrate `linkmgrd` with swss logger (#15392 ) Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-06-26 16:40:58 +08:00
Hua Liu	05f1a5a31e	Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429 ) Add watchdog mechanism to swss service and generate alert when swss have issue. Work item tracking Microsoft ADO (number only): 16578912 What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Manually test process_monitoring/test_critical_process_monitoring.py can pass. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-12 17:53:54 -07:00
Ye Jianquan	cec9d7b83a	Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 )" (#15390 ) This reverts commit `44427a2f6b`. Docker image not updated during PR validation and caused PR check failures. Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.	2023-06-09 09:10:35 +08:00
abdosi	6139c525d2	updated internal route policy for chassis-packet (#15349 ) What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-06-07 09:17:44 -07:00
Hua Liu	44427a2f6b	Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 ) This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first. What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-05 22:21:17 -07:00
Mai Bui	1477f779de	modify commands using utilities_common.cli.run_command and advance sonic-utilities submodule on master (#15193 ) Dependency: sonic-net/sonic-utilities#2718 Why I did it This PR sonic-net/sonic-utilities#2718 reduce shell=True usage in utilities_common.cli.run_command() function. Work item tracking Microsoft ADO (number only): 15022050 How I did it Replace strings commands using utilities_common.cli.run_command() function to list of strings due to circular dependency, advance sonic-utilities submodule 72ca4848 (HEAD -> master, upstream/master, upstream/HEAD) Add CLI configuration options for teamd retry count feature (sonic-net/sonic-utilities#2642) 359dfc0c [Clock] Implement clock CLI (sonic-net/sonic-utilities#2793) b316fc27 Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (sonic-net/sonic-utilities#2772) dc59dbd2 Replace pickle by json (sonic-net/sonic-utilities#2849) a66f41c4 [show] replace shell=True, replace xml by lxml, replace exit by sys.exit (sonic-net/sonic-utilities#2666) 57500572 [utilities_common] replace shell=True (sonic-net/sonic-utilities#2718) 6e0ee3e7 [CRM][DASH] Extend CRM utility to support DASH resources. (sonic-net/sonic-utilities#2800) b2c29b0b [config] Generate sysinfo in single asic (sonic-net/sonic-utilities#2856)	2023-06-05 17:08:13 +08:00
Tejaswini Chadaga	8058550c09	[bgp]: Add sudo check for TSA/B/C command execution (#15288 ) TSA/B/C scripts invoke commands that require root permissions. If the user does not have sudo permissions, the scripts today execute until the command and throw a backtrace with error at the specific command. Added a check to ensure the operations check for root permissions upfront.	2023-06-03 23:47:26 +02:00
Baorong Liu	acb423b255	[staticroutebfd]fix an issue on deleting a non-bfd static route (#15269 ) * [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.	2023-06-02 11:46:56 -07:00
qiwang4	359b80e012	[master]staticroutebfd process implementation (#13789 ) * [BFD] staticroutebfd implementation * To enable the BFD for static route HLD: sonic-net/SONiC#1216	2023-05-26 16:32:05 -07:00
Sachin Holla	ba6aba2b92	[mgmt-framework] Fix rest-server startup script (#14979 ) This script was using 'null' as default value for all optional fields of REST_SERVER table -- due to incorrect use of 'jq -r' command. Server was not coming up when REST_SERVER entry exists but some fields were not given (which is a valid configuration). Fixed the jq query expression to return empty string for non existing fields. Signed-off-by: Sachin Holla <sachin.holla@broadcom.com>	2023-05-22 17:42:38 -07:00
Tejaswini Chadaga	4e60f0d563	Template change for BGP monitors on T2 (#14844 ) Why I did it To support BGPMon sessions from each T2 linecard ASIC Work item tracking Microsoft ADO (number only): 17873174 How I did it Added change in BGPMon configuration to use Loopback4096 as source interface, since this has a unique IP per ASIC. How to verify it Tested by manually setting up BGPMon session on T2 LC and verified that Loopback4096 could be used as source	2023-05-09 13:40:00 -07:00
abdosi	9b8b4e6e4d	[bgp/TSA]: Fixed the internal peer route-map policy (#14804 ) What I did: In FRR command update source <interface-name> is not at address-family level. Because of this internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result TSA over iBGP for Ipv6 was not getting applied. How I verify: Manual Verification of TSA over both ipv4 and ipv6 after fix works fine. Updated UT for this. Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-05-05 13:55:05 -07:00
Zain Budhwani	4974b5c49c	Add idle conn duration config to telemetry.sh (#14903 ) Why I did it Supports new field in sonic-net/sonic-gnmi@258b887 Work item tracking Microsoft ADO (number only): 13468195 How I did it Add new field in telemetry.sh How to verify it Pipeline	2023-05-04 16:47:02 -07:00
Tejaswini Chadaga	ca224863cb	Changes to support TSA from supervisor (#14691 ) Why I did it Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module Work item tracking Microsoft ADO (number only): 17826134 How I did it When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this) Changes also include no-stats option with TSC for quick retrieval of the current system isolation state This PR also reverts changes in #11403 How to verify it These changes have a dependency on sonic-net/sonic-utilities#2701 for testing Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard Verify that all routes are withdrawn from eBGP neighbors on all linecards Run TSB from supervisor module and ensure transition to Normal mode on each linecard Verify that all routes are re-advertised from eBGP neighbors on all linecards Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards	2023-04-28 16:28:06 +08:00
judyjoseph	6370257fa3	[macsec]: show macsec: add --profile option, include profile name in show command output (#13940 ) This PR is to add the following Add a new options "--profile" to the show macsec command, to show all profiles in device Update the currentl show macsec command, to show profile in each interface o/p. This will tell which macsec profile the interface is attached to.	2023-04-27 08:51:28 -07:00
Stepan Blyshchak	04099f075d	[BGP] support BGP pending FIB suppression (#12853 ) Signed-off-by: Stepan Blyschak stepanb@nvidia.com DEPENDS: #12852 Why I did it To support BGP pending FIB suppression. How I did it I backported patches from FRR 8.4 feature that allows communicating ASIC route status back to FRR. Also, added a new field in DEVICE_METADATA YANG model table. Added UT for YANG model changes. How to verify it Run on the switch.	2023-04-20 19:56:13 +08:00
Hua Liu	a14cc76879	Install python-redis package to docker containers (#14632 ) Install python-redis package to docker containers #### Why I did it This this bug: https://github.com/sonic-net/sonic-buildimage/issues/14531 The 'flush_unused_database' is part of docker-database, and docker-database does not install python-redis package by itself. it's using redis installed by sonic-py-swsssdk. So after remove sonic-py-swsssdk from container, this script break. To this this bug and avoid similer bug happen again, install python-redis to docker containers which removed sonic-py-swsssdk . #### How I did it Install python-redis to containers. #### How to verify it Pass all UT. Create new UT to cover this scenario: https://github.com/sonic-net/sonic-mgmt/pull/8032 #### Description for the changelog Improve sudo cat command for RO user.	2023-04-19 18:14:48 -07:00
Zain Budhwani	e9a9c9e31f	Update telemetry.sh with threshold config (#14615 ) #### Why I did it Threshold is a new config field passed to telelemetry.go as parameter #### How I did it Add check for threshold #### How to verify it Modify telemetry.sh, systemctl restart telemetry, telemetry process has threshold of 100	2023-04-18 14:29:30 -07:00
Kuanyu Chen	cffd87a627	Add monit_snmp file to monitor memory usage (#14464 ) #### Why I did it When CPU is busy, the sonic_ax_impl may not have sufficient speed to handle the notification message sent from REDIS. Thus, the message will keep stacking in the memory space of sonic_ax_impl. If the condition continues, the memory usage will keep increasing. #### How I did it Add a monit file to check if the SNMP container where sonic_ax_impl resides in use more than 4GB memory. If yes, restart the sonic_ax_impl process. #### How to verify it Run a lot of this command: `while true; do ret=$(redis-cli -n 0 set LLDP_ENTRY_TABLE:test1 test1); sleep 0.1; done;` And check the memory used by sonic_ax_impl keeps increasing. After a period, make sure the sonic_ax_impl is restarted when the memory usage reaches the 4GB threshold. And verify the memory usage of sonic_ax_impl drops down from 4GB.	2023-04-06 12:19:11 -07:00
Christian Svensson	bce824723c	[sflow] Switch to bullseye (#14494 ) Change references to use bullseye instead of buster Why I did it Almost all daemons in 202211 and master uses bullseye, and sflow was easy to migrate. How I did it Replaced the references, built and tested in 202211. How to verify it Build with the changes, enable sflow: admin@sonic:~$ sudo config sflow collector add test 1.2.3.4 admin@sonic:~$ sudo config sflow collector enable tcpdump on 1.2.3.4 and see that UDP sFlow are being sent. Signed-off-by: Christian Svensson <blue@cmd.nu>	2023-04-03 09:49:35 -07:00
Christian Svensson	67abcff944	[nat] Switch to bullseye (#14495 ) Change references to use bullseye instead of buster Why I did it Almost all daemons in 202211 and master uses bullseye, and NAT seems easy to migrate. How I did it Replaced the references, built with 202211 branch. How to verify it Not sure, it builds and tests pass as far as I can tell but I don't use the feature myself. Signed-off-by: Christian Svensson <blue@cmd.nu>	2023-04-02 14:02:33 -07:00
Gokulnath-Raja	cedc4d914f	[sflow] Exception handling for if_nametoindex (#11437 ) (#13567 ) catch system error and log as warning level instead of error level in case interface was already deleted. Why I did it sflow process exited when failed to convert the interface index from interface name How I did it Added exception handling code and logged when OSError exception. How to verify it Recreated the bug scenario #11437 and ensured that sflow process not exited. Description for the changelog catch system error and log as warning level instead of error level in case interface was already deleted. Logs steps : root@sonic:~# sudo config vlan member del 4094 PortChannel0001 root@sonic:~# sudo config vlan member del 4094 Ethernet2 root@sonic:~# sudo config vlan del 4094 root@sonic:~# "WARNING sflow#port_index_mapper: no interface with this name" is seen but no crash is reported syslogs : Jan 23 09:17:24.420448 sonic NOTICE swss#orchagent: :- removeVlanMember: Remove member Ethernet2 from VLAN Vlan4094 lid:ffe vmid:27000000000a53 Jan 23 09:17:24.420710 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3 Jan 23 09:17:24.420847 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 3 Jan 23 09:17:24.426082 sonic NOTICE syncd#syncd: :- processFdbFlush: fdb flush succeeded, updating redis database Jan 23 09:17:24.426242 sonic NOTICE syncd#syncd: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000a52, bvId = oid:0x26000000000a51 Jan 23 09:17:24.426374 sonic NOTICE syncd#syncd: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:oid:0x26000000000a51, portStr oid:0x3a000000000a52 Jan 23 09:17:24.427104 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fe80::/64 :: eth0 Jan 23 09:17:24.427182 sonic NOTICE bgp#fpmsyncd: :- onRouteMsg: RouteTable del msg for route with only one nh on eth0/docker0: fd00::/80 :: docker0 Jan 23 09:17:24.428502 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC Jan 23 09:17:24.429058 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000606 sec Jan 23 09:17:24.431496 sonic NOTICE swss#orchagent: :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_STRIP to host interface: Ethernet2 Jan 23 09:17:24.431675 sonic NOTICE swss#orchagent: :- flushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2 Jan 23 09:17:24.431797 sonic NOTICE swss#orchagent: :- recordFlushFdbEntries: flush key: SAI_OBJECT_TYPE_FDB_FLUSH:oid:0x21000000000000, fields: 2 Jan 23 09:17:24.437009 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: processing consolidated fdb flush event of type: SAI_FDB_ENTRY_TYPE_DYNAMIC Jan 23 09:17:24.437532 sonic NOTICE swss#orchagent: :- meta_sai_on_fdb_flush_event_consolidated: fdb flush took 0.000514 sec Jan 23 09:17:24.437942 sonic NOTICE syncd#syncd: :- processFdbFlush: fdb flush succeeded, updating redis database Jan 23 09:17:24.438065 sonic NOTICE syncd#syncd: :- processFlushEvent: received a flush port fdb event, portVid = oid:0x3a000000000a52, bvId = oid:0x0 Jan 23 09:17:24.438173 sonic NOTICE syncd#syncd: :- processFlushEvent: pattern ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:*, portStr oid:0x3a000000000a52 Jan 23 09:17:24.440348 sonic NOTICE swss#orchagent: :- removeBridgePort: Remove bridge port Ethernet2 from default 1Q bridgeJan 23 09:17:29.782554 sonic NOTICE swss#orchagent: :- removeVlan: VLAN Vlan4094 still has 1 FDB entries Jan 23 09:17:29.791373 sonic WARNING sflow#port_index_mapper: no interface with this name Signed-off-by: Gokulnath-Raja <Gokulnath_R@dell.com>	2023-03-27 10:19:05 -07:00
ShiyanWangMS	06795931b7	Add AZP agent necessary packages to sonic-mgmt-docker (#14291 ) Why I did it Add AZP agent necessary packages to sonic-mgmt-docker Remove Python 201811 venv Update some packages in order to meet internal security requirements How I did it Update sonic-mgmt-docker file How to verify it sonic-mgmt-docker can run: bash, apt update, apt install and ping. start.sh is under /azp with exec permission. env-201811 venv is removed. jinja2 is upgrade to 2.10.1	2023-03-21 08:09:44 +08:00
Zain Budhwani	881b925d19	Fix telemetry.sh passing in null as log level value (#14303 ) #### Why I did it Bug in script that was passing in null as log level value if missing from config_db #### How I did it Added more robust conditional statement #### How to verify it 1) Remove log_level from config db 2) config reload -y 3) telemetry should not crash	2023-03-20 16:22:11 -07:00
Vivek	f19c414176	[lldpmgrd] Don't log error message for outdated event (#14178 ) - Why I did it Fixes #14236 When a redis event quickly gets outdated during port breakout, error logs like this are seen Mar 8 01:43:26.011724 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': {'admin_status': 'down'}, 'Ethernet68': {'admin_status': 'down'}}} Mar 8 01:43:26.012565 r-leopard-56 INFO ConfigMgmt: Writing in Config DB Mar 8 01:43:26.013468 r-leopard-56 INFO ConfigMgmt: Write in DB: {'PORT': {'Ethernet64': None, 'Ethernet68': None}, 'INTERFACE': None} Mar 8 01:43:26.018095 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Configure Ethernet64 admin status to down Mar 8 01:43:26.018309 r-leopard-56 NOTICE swss#portmgrd: :- doTask: Delete Port: Ethernet64 Mar 8 01:43:26.018641 r-leopard-56 NOTICE lldp#lldpmgrd[32]: :- pops: Miss table key PORT_TABLE:Ethernet64, possibly outdated Mar 8 01:43:26.018654 r-leopard-56 ERR lldp#lldpmgrd[32]: unknown operation '' - How I did it Only log the error when the op is not empty and not one of ("SET" & "DEL" ) Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2023-03-16 18:15:50 +02:00
Ye Jianquan	5e85c01621	Add scandir into sonic-mgmt docker image (#14219 ) Why I did it TestbedV2 requires scandir python package How I did it Install scandir packages	2023-03-14 08:58:11 +08:00
xumia	5f4d063506	[Build] Fix the mirror gpg key expired issue (#14206 ) Why I did it [Build] Fix the mirror gpg key expired issue See vs build: https://dev.azure.com/mssonic/build/_build/results?buildId=231680&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795 How I did it Add the apt option not to check the valid until, the option is set to the SONiC docker base image, docker ptf missing the option. Acquire::Check-Valid-Until "false"; How to verify it The build of docker-ptf is succeeded after fixed. 2023-03-11T17:26:35.1801999Z [ building ] [ target/docker-ptf.gz ] 2023-03-11T17:38:10.1608536Z [ finished ] [ target/docker-ptf.gz ]	2023-03-13 11:13:21 +08:00
Yaqiang Zhu	284ba61a86	[dhcp-relay] Add dhcp_relay show cli (#13614 ) Why I did it Currently the show and clear cli of dhcp_relayis may cause confusion. How I did it Add doc for it: [doc] Add docs for dhcp_relay show/clear cli sonic-utilities#2649 Add dhcp_relay config cli and test cases. show dhcp_relay ipv4 helper show dhcp_relay ipv6 destination show dhcp_relay ipv6 counters sonic-clear dhcp_relay ipv6 counters How to verify it Unit test all passed	2023-03-06 10:48:25 -08:00
ppikh	de84eb98c7	[ptf]: Added package "wireshark-common" into PTF docker (#14070 ) It will allow us to have application called "mergecap" - which can merge multiple .pcap files into single .pcapng file and convert it to .pcap file Signed-off-by: Petro Pikh <petrop@nvidia.com>	2023-03-04 17:47:42 -08:00
Vaibhav Hemant Dixit	860bc7492a	Add shellcheck and mock modules for running unit and linter test (#14062 )	2023-03-03 19:24:26 +00:00
Tejaswini Chadaga	f80bf7783d	Fix VOQ_CHASSIS_V6_PEER route-map config (#14055 ) * Fix typo in VOQ_CHASSIS_V6_PEER route-map config * Updated UT files with the changed config	2023-03-03 09:28:57 -08:00
Zain Budhwani	165e33b4e4	Remove dialout as critical process (#14006 ) #### Why I did it Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely #### How I did it Remove from critical process list	2023-02-28 15:56:54 -08:00

1 2 3 4 5 ...

1078 Commits