sonic-buildimage

Author	SHA1	Message	Date
abdosi	c6c8c934e1	[buildfix-201911] Fix the snmp docker build error. (#7452 ) Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6. Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835). However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88) Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py \| python3.6 ---> Running in bece31f49267 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1891k 100 1891k 0 0 9564k 0 --:--:-- --:--:-- --:--:-- 9600k Traceback (most recent call last): File "<stdin>", line 24298, in <module> File "<stdin>", line 139, in main File "<stdin>", line 115, in bootstrap File "<stdin>", line 96, in monkeypatch_for_cert File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module> File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module> ImportError: cannot import name 'NoReturn' The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py \| python3.6' returned a non-zero code: 1 How I did: Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-04-28 01:28:55 -07:00
yozhao101	aeae87d1b5	[201911][Monit] Use VLAN name to differentiate each Monit service of dhcp_relay (#7378 ) #### Why I did it Since we will have multiple `dhcrelay` processes if there exists different VLANs in the table `VLAN_INTERFACE` of `CONIFG_DB`, we should use unique service name for each `dhcrelay` process in Monit configuration file. Otherwise, Monit service will fail to work. #### How I did it I append the VLAN name to the end of each service name such that they are unique. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2021-04-22 18:04:29 -07:00
yozhao101	528543bc6a	[201911][Monit] Monitor critical processes in radv and dhcp_relay containers. (#7340 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor critical processes in router advertiser and dhcp_relay containers by Monit. How I did it Router advertiser container only ran on T0 device and the T0 device should have at least one VLAN interface which was configured an IPv6 address. At the same time, router advertiser container will not run on devices of which the deployment type is 8. As such, I created a service which will dynamically generate Monit configuration file of router advertiser from a template. Similarly Monit configuration file of dhcp_relay was also generated from a template since the number of dhcrelay process in dhcp_relay container is depended on number of VLANs. How to verify it I verified this implementation on a DuT.	2021-04-16 08:40:06 -07:00
judyjoseph	b9f8348a5d	Fixes for errors seen in staging devices (#7171 ) With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix. admin@STG01-0101-0102-01T1:~$ TSB BGP0 : % Could not find route-map entry TO_TIER0_V4 20 line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20 % Could not find route-map entry TO_TIER0_V4 30 line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30 In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.	2021-04-08 15:30:46 -07:00
Tamer Ahmed	86ea554d4a	[radv] Fix Script Name Change (#7254 ) PR https://github.com/Azure/sonic-buildimage/pull/4599 changed startup script name from wait_for_intf.sh.j2 to wait_for_link.sh.j2, however when PR https://github.com/Azure/sonic-buildimage/pull/5178 was cherry- picked, the script name was not changed to wait_for_link.sh. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-04-08 09:56:31 -07:00
Joe LeVeque	72b32a96fc	[201911][dockers][supervisor] Increase event buffer size for process exit listener (#7106 ) Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 201911 branch. #### Why I did it To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-29 10:07:43 -07:00
judyjoseph	c15b5ea339	To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087 ) Why I did it It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later. In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer How I did it Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.	2021-03-17 23:16:44 -07:00
Tamer Ahmed	7c5f0ff316	Start DHCP Relay When Helpers IPs Are Available (#6961 ) (#7059 ) It is possible to have DHCP relay configuration with no servers/ helpers which result in DHCP container to crash. This PR fixes this issue by not starting DHCP relay for vlans with no DHCP helpers. resolves: #6931 closes: #6931 Do not add program group for dhcp relay with not dhcp helpers Unit test	2021-03-15 14:43:50 -07:00
trzhang-msft	a0b824f83e	[docker-dhcp-relay]: add -si support in dhcp docker template (#7054 )	2021-03-15 09:21:32 -07:00
Ze Gan	b73d5a659e	[docker-ptf]: Add teamd dependency to ptf (#6994 ) Signed-off-by: Ze Gan <ganze718@gmail.com>	2021-03-10 10:50:17 -08:00
Qi Luo	b12383013f	[build]: Fix get-pip 2.7 url according to upstream announcement (#6999 ) ref: https://bootstrap.pypa.io/2.7/get-pip.py The URL you are using to fetch this script has changed, and this one will no longer work. Please use get-pip.py from the following URL instead: https://bootstrap.pypa.io/pip/2.7/get-pip.py	2021-03-10 09:51:31 -08:00
abdosi	ab05a2f58a	Add support for BGP Monitors on multi asic SONiC platforms. (#6977 ) This PR is cherry-pick of master https://github.com/Azure/sonic-buildimage/pull/6920 Why I did it Add support for BGP Monitors on multi asic SONiC platforms. How I did it On multi ASIC SONiC platforms, BGP monitor session will be established from Backend ASIC. To achieve this following changes are done Add BGP monitor configuration on the backend ASIC. The BGP monitor configuration is present in the DPG of the device in minigraph.xml of multi-ASIC device, so this configuration will be added to the config_db of the host, when the minigraph is loaded. To add configuration for this in the Backend ASIC, a new class MultiAsicBgpMonCfg is added to the hostcfgd service to update the config_db of the backend ASIC when the BGP_MONITOR table of the host config_db is updated. This way incremental BGP_MONITOR configuration can also be handled. Changes to establish BGP session with bgp monitor. Add route in host main routing table to go to one of pre-define backend asic Add IP table rule on front asic to mark the BGP packets with destination as IPv4 Loopback. Add IP rule in front asic namespace to match mark BGP packet and lookup default table Program the default route in FrontEnd asic name space docker default table as part of start.sh of the BGP container. It need to be done as part of start.sh otherwise FRR default route will get over-written. How to verify it Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> Co-authored-by: Arvind <arlakshm@microsoft.com>	2021-03-06 21:21:52 -08:00
abdosi	9dc285ab05	Changes in FRR temapltes for multi-asic (#6901 ) 1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic. 2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output. 3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27 4. Enhancement to add mult_asic specific bgpd template generation unit test cases.	2021-03-02 14:42:22 -08:00
abdosi	fbc3386825	[multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848 ) Enable BBR config allowas-in 1 for internal peers Why I did: To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.	2021-03-02 13:44:17 -08:00
Qi Luo	c9febff961	[radv] Disable radv for specific deployment_id (#6830 )	2021-02-22 18:52:40 -08:00
judyjoseph	86a13610cb	[docker-fpm-frr]: TSA/B/C changes for multi-asic (#6510 ) - Introduced TS common file in docker as well and moved common functions. - TSA/B/C scripts run only in BGP instances for front end ASICs. In addition skip enforcing it on route maps used between internal BGP sessions. admin@str--acs-1:~$ sudo /usr/bin/TSA System Mode: Normal -> Maintenance and in case of Multi-ASIC admin@str--acs-1:~$ sudo /usr/bin/TSA BGP0 : System Mode: Normal -> Maintenance BGP1 : System Mode: Normal -> Maintenance BGP2 : System Mode: Normal -> Maintenance	2021-02-18 18:04:24 -08:00
Petro Bratash	4031791b4e	[lldp]: Add verification IPv4 address on LLDP conf Jinja2 Template (#5699 ) Fix #5812 LLDP conf Jinja2 Template does not verify IPv4 address and can use IPv6 version. This issue does not effect control LLDP daemon. Issue can be reproduced via `test_snmp_lldp` test. LLDP conf Jinja2 Template selects first item from the list of mgmt interfaces. TESTBED_1 LLDP conf ``` configure ports eth0 lldp portidsubtype local eth0 configure system ip management pattern FC00:3::32 configure system hostname dut-1 ``` TESTBED_2 LLDP conf ``` configure ports eth0 lldp portidsubtype local eth0 configure system ip management pattern 10.22.24.61 configure system hostname dut-2 ``` TESTBED_1 MGMT_INTERFACE ``` $ redis-cli -n 4 keys "" \| grep MGMT_INTERFACE MGMT_INTERFACE\|eth0\|10.22.24.53/23 MGMT_INTERFACE\|eth0\|FC00:3::32/64 ``` TESTBED_2 MGMT_INTERFACE ``` $ redis-cli -n 4 keys "" \| grep MGMT_INTERFACE MGMT_INTERFACE\|eth0\|FC00:3::32/64 MGMT_INTERFACE\|eth0\|10.22.24.61/23 ``` Signed-off-by: Petro Bratash <petrox.bratash@intel.com>	2021-02-11 15:34:06 -08:00
abdosi	95bcefa7c9	[201911] Fix PTF Docker Build Error (#6583 ) We are hitting the issue as described pypa/pip#9520. Fix to use get_pip.py from 2.7 repo. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-01-28 02:19:12 -08:00
arlakshm	3cd536bb45	[Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com - Why I did it This PR has the changes to support having different swss.rec and sairedis.rec for each asic. The logrotate script is updated as well - How I did it Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747) In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec Update the logrotate script for multiasic platform .	2021-01-27 17:12:32 -08:00
Tamer Ahmed	c5bd46f857	[dhcp-relay]: Launch DHCP Relay On L3 Vlan (#6527 ) Recent changes brought l2 vlan concept which do not have DHCP clients behind them and so DHCP relay is not required. Also, dhcpmon fails to launch on those vlans as their interfaces lack IP addresses. This PR limit launch of both DHCP relay and dhcpmon to L3 vlans only. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-01-25 12:38:16 -08:00
pavel-shirshov	beaaf3316d	[docker-frr]: Use egrep with regexp to match correct TSA rules (#6403 ) - Why I did it Earlier today we found a bug in the SONiC TSA implementation. TSC shows incorrect output (see below) in case we have a route-map which contains TSA route-map as a prefix. ``` admin@str-s6100-acs-1:~$ TSC Traffic Shift Check: System Mode: Not consistent ``` The reason is that TSC implementation has too loose regexps in TSA utilities, which match wrong route-map entries: For example, current TSC matches following ``` route-map TO_BGP_PEER_V4 permit 200 route-map TO_BGP_PEER_V6 permit 200 ``` But it should match only ``` route-map TO_BGP_PEER_V4 permit 20 route-map TO_BGP_PEER_V4 deny 30 route-map TO_BGP_PEER_V6 permit 20 route-map TO_BGP_PEER_V6 deny 30 ``` - How I did it I fixed it by using egrep with `^` and `$` regexp markers which match begin and end of the line. - How to verify it 1. Add follwing entry to FRR config: ``` str-s6100-acs-1# str-s6100-acs-1# conf t str-s6100-acs-1(config)# route-map TO_BGP_PEER_V4 permit 200 str-s6100-acs-1(config-route-map)# end ``` 2. Use the TSC command and check output. It should show normal. ``` admin@str-s6100-acs-1:~$ TSC Traffic Shift Check: System Mode: Normal```	2021-01-20 10:37:10 -08:00
pavel-shirshov	f4245fb18d	[bgpcfgd]: Support default action for "Allow prefix" feature (#6370 ) * Use 20 and 30 route-map entries instead of 2 and 3 for TSA * Added support for dynamic "Allow list" default action. Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2021-01-08 15:12:52 -08:00
abdosi	a3d093a82a	Updated imfile configuration for supervisord logs (#6368 ) Updated imfile configuration for supervisord logs for stretch and buster.	2021-01-06 18:48:24 -08:00
abdosi	6e48839cae	Enable the notify mode of rsyslogd imfile module used for supervisord (#6298 ) Enable the notify mode of rsyslogd imfile module used for supervisord logs in docker container	2020-12-31 17:04:00 -08:00
Stepan Blyshchak	d43e8e16a3	[fpm-frr] fix start.sh template paths (#6329 ) There is no /usr/share/sonic/templates/supervisord/ folder and no supervisord.conf.j2 template. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2020-12-31 17:01:24 -08:00
Junchao-Mellanox	547ec0a905	Add a configuration to delay start xcvrd for fast-reboot (#5643 )	2020-12-22 09:51:54 -08:00
Tamer Ahmed	afc952535e	[mgmt-framework] Call sonic-cfggen Once (#4937 ) Optimizing number of calls made to sonic-cfggen during service start up as it adds to total system boot up time. *-Test 1* there is an average saving of 1 to 1.5 sec between old script and new script ``` root@str-s6000-acs-14:/# time /usr/bin/rest-server-old.sh Generating temporary TLS server certificate ... 2020/07/09 19:03:33 wrote cert.pem 2020/07/09 19:03:33 wrote key.pem REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem /usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem real 0m8.790s user 0m7.993s sys 0m0.584s root@str-s6000-acs-14:/# time /usr/bin/rest-server-new.sh Generating temporary TLS server certificate ... 2020/07/09 19:03:45 wrote cert.pem 2020/07/09 19:03:45 wrote key.pem REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem /usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem real 0m6.940s user 0m5.670s sys 0m0.386s ``` *-Test 2* Built an image with this change and rest server is running with params as described in test 1 above ``` admin@str-s6000-acs-14:~$ ps -ef \| grep rest_server root 3301 2866 2 02:09 pts/0 00:00:10 /usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem ``` signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	2b3e18c0cc	[swss] Reduce Calls to SONiC Cfggen (#5177 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when starting swss service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	fd3e0b4c58	[frr] Reduce Calls to SONiC Cfggen (#5176 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to two calls during startup when starting frr service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	c5f53f50b2	[radv] Reduce Calls to SONiC Cfggen (#5178 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when starting radv service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	687c971a52	[dhcp-relay] Reduce Calls to SONiC Cfggen (#5175 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when starting dhcp-relay service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	066a0b3b2b	[snmp]: Reduce Calls to SONiC Cfggen (#5166 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to once calle during snmp startup singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	fae4c4bfcc	[swss] Enhance ARP Update to Call Sonic Cfggen Once (#5398 ) This PR limited the number of calls to sonic-cfggen to one call per iteration instead of current 3 calls per iteration. The PR also installs jq on host for future scripts if needed. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
pavel-shirshov	0931280466	[TSA]: Fix TSC. Avoid 'Not consistent' state (#5968 )	2020-12-10 16:43:37 -08:00
pavel-shirshov	9e0ea83cd9	[bgpcfgd]: Use peer commands for BBR, not peer-group (#6048 ) * templates: Move 'allowas-in' command from peer-group to instance configuration * Use peer itself, don't rely on peer-groups	2020-11-26 09:55:24 -08:00
Lawrence Lee	cb32b362f5	Make backend device checking more robust (#5730 ) Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-14 08:39:08 -08:00
pavel-shirshov	e9ff96d90e	[bgp]: Update TSA functionality (#5906 ) Fixed TSA bugs: 1. TSA didn't advertise Loopback ipv6 address 2. TSA and TSB changed BGP dynamic and BGP monitors sessions - How to verify it Build an image and run on your DUT. ``` admin@str-s6100-acs-1:~$ TSA System Mode: Normal -> Maintenance admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path > 10.1.0.32/32 0.0.0.0 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> fc00:1::/64 :: 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ TSB System Mode: Maintenance -> Normal ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-14 08:35:13 -08:00
judyjoseph	005702ba0e	[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760 ) - Why I did it Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table. Additionally to address the review comment #5520 (comment) Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up. Updates to the internal tests data + add all of it to template tests. - How I did it Updated the APIs and the template files. - How to verify it Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()	2020-11-10 12:53:49 -08:00
judyjoseph	ce86621399	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-11-10 12:52:58 -08:00
abdosi	65cc37cadf	[multi-asic] teamdctl support for multi-asic (#5851 ) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-11-09 12:33:41 -08:00
Junchao-Mellanox	1070d024bc	[thermalctld] Enlarge startretries value to avoid thermalctld not able to restart during regression test (#5633 ) Increase startretires value from default of 10 to 50 to prevent supervisor from placing thermalctld in FATAL state during regression testing. Also ensures supervisord tries hard to get thermalctld running in production, as thermalctld is critical to prevent device from overheating.	2020-11-03 08:19:19 -08:00
abdosi	0fad6bdc7f	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-11-01 10:27:10 -08:00
shlomibitton	97f2cafe0b	[LLDP] Fix for LLDP advertisements being sent with wrong information. (#5493 ) * Fix for LLDP advertisments being sent with wrong information. Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID. This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com> * Fix comments * Fix unit-test output caused a failure during build * Add 'run_cmd' function and use it * Resume lldpd even if port init timeout reached	2020-10-30 09:06:23 -07:00
pavel-shirshov	2eec3b3254	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-30 08:58:27 -07:00
pavel-shirshov	84405ab953	[bgp]: Enable next-hop-tracking through default (#5600 ) - Why I did it FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality. That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default` - How I did it Put `ip nht resolve-via-default` into the config - How to verify it Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com> Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-13 22:42:29 -07:00
abdosi	9202b1c7eb	Fix monit complaining of snmp on 201911 branch. (#5612 ) There is difference between master and 201911 how sonic_ax_impl is started. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-13 17:17:43 -07:00
Mahesh Maddikayala	f354a20d94	[ECMP][Multi-ASIC] Have different ECMP seed value on each ASIC (#5357 ) * Calculate ECMP hash seed based on ASIC ID on multi ASIC platform. Each ASIC will have a unique ECMP hash seed value. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-13 09:48:57 -07:00
pavel-shirshov	437ad95646	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-06 11:15:19 -07:00
abdosi	3a29249e04	[Multi-asic] Fixed Default Route to be BGP (#5548 ) Learned and not docker default route for multi-asic platforms. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-06 06:04:31 +00:00
Nazarii Hnydyn	f456f1fd03	[monit]: Fix process checker. (#5480 ) Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2020-09-30 00:25:37 +00:00

1 2 3 4 5 ...

669 Commits