sonic-buildimage

Author	SHA1	Message	Date
judyjoseph	1ad5dbeab6	Fixes for errors seen in staging devices (#7171 ) With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix. admin@STG01-0101-0102-01T1:~$ TSB BGP0 : % Could not find route-map entry TO_TIER0_V4 20 line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20 % Could not find route-map entry TO_TIER0_V4 30 line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30 In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.	2021-04-08 15:16:43 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
Shi Su	de64c4e34c	[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169 ) The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.	2021-03-27 11:36:56 -07:00
judyjoseph	9d9503e1fe	To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087 ) Why I did it It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later. In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer How I did it Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.	2021-03-17 23:14:38 -07:00
abdosi	30b6668b7d	Changes in FRR temapltes for multi-asic (#6901 ) 1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic. 2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output. 3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27 4. Enhancement to add mult_asic specific bgpd template generation unit test cases.	2021-02-26 17:05:15 -08:00
abdosi	a520cecb44	[multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848 ) Enable BBR config allowas-in 1 for internal peers Why I did: To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.	2021-02-25 23:15:02 -08:00
judyjoseph	ad88700912	[docker-fpm-frr]: TSA/B/C changes for multi-asic (#6510 ) - Introduced TS common file in docker as well and moved common functions. - TSA/B/C scripts run only in BGP instances for front end ASICs. In addition skip enforcing it on route maps used between internal BGP sessions. admin@str--acs-1:~$ sudo /usr/bin/TSA System Mode: Normal -> Maintenance and in case of Multi-ASIC admin@str--acs-1:~$ sudo /usr/bin/TSA BGP0 : System Mode: Normal -> Maintenance BGP1 : System Mode: Normal -> Maintenance BGP2 : System Mode: Normal -> Maintenance	2021-02-12 10:56:44 -08:00
Guohan Lu	f7346cca32	[docker-fmp-frr]: remove blank lines in generated critical_process Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-01-27 19:41:59 -08:00
Shi Su	aab37b7f42	[FRR] Create a separate script to wait zebra to be ready to receive connections (#6519 ) The requirement for zebra to be ready to accept connections is a generic problem that is not specific to bgpd. Making the script to wait for zebra socket a separate script and let bgpd and staticd to wait for zebra socket.	2021-01-27 12:36:02 -08:00
Zhenhong Zhao	a171e6c5e4	[frrcfgd] introduce frrcfgd to manage frr config when frr_mgmt_framework_config is true (#5142 ) - Support for non-template based FRR configurations (BGP, route-map, OSPF, static route..etc) using config DB schema. - Support for save & restore - Jinja template based config-DB data read and apply to FRR during startup - How I did it - add frrcfgd service - when frr_mgmg_framework_config is set, frrcfgd starts in bgp container - when user changed the BGP or other related table entries in config DB, frrcfgd will run corresponding VTYSH commands to program on FRR. - add jinja template to generate FRR config file to be used by FRR daemons while bgp container restarted - How to verify it 1. Add/delete data on config DB and then run VTYSH "show running-config" command to check if FRR configuration changed. 1. Restart bgp container and check if generated FRR config file is correct and run VTYSH "show running-config" command to check if FRR configuration is consistent with attributes in config DB Co-authored-by: Zhenhong Zhao <zhenhong.zhao@dell.com>	2021-01-24 17:57:03 -08:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Shi Su	afee1a851c	[bgpd]: Check zebra is ready to connect when starting bgpd (#6478 ) Fix #5026 There is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within these 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. To avoid such a scenario, bgpd should start after zebra is ready to accept connections.	2021-01-19 00:23:36 -08:00
pavel-shirshov	16e54340b7	[docker-frr]: Use egrep with regexp to match correct TSA rules (#6403 ) - Why I did it Earlier today we found a bug in the SONiC TSA implementation. TSC shows incorrect output (see below) in case we have a route-map which contains TSA route-map as a prefix. ``` admin@str-s6100-acs-1:~$ TSC Traffic Shift Check: System Mode: Not consistent ``` The reason is that TSC implementation has too loose regexps in TSA utilities, which match wrong route-map entries: For example, current TSC matches following ``` route-map TO_BGP_PEER_V4 permit 200 route-map TO_BGP_PEER_V6 permit 200 ``` But it should match only ``` route-map TO_BGP_PEER_V4 permit 20 route-map TO_BGP_PEER_V4 deny 30 route-map TO_BGP_PEER_V6 permit 20 route-map TO_BGP_PEER_V6 deny 30 ``` - How I did it I fixed it by using egrep with `^` and `$` regexp markers which match begin and end of the line. - How to verify it 1. Add follwing entry to FRR config: ``` str-s6100-acs-1# str-s6100-acs-1# conf t str-s6100-acs-1(config)# route-map TO_BGP_PEER_V4 permit 200 str-s6100-acs-1(config-route-map)# end ``` 2. Use the TSC command and check output. It should show normal. ``` admin@str-s6100-acs-1:~$ TSC Traffic Shift Check: System Mode: Normal```	2021-01-14 11:09:16 -08:00
pavel-shirshov	83715cfc49	[bgpcfgd]: Support default action for "Allow prefix" feature (#6370 ) * Use 20 and 30 route-map entries instead of 2 and 3 for TSA * Added support for dynamic "Allow list" default action. Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2021-01-08 14:03:26 -08:00
Ubuntu	273846a412	FRR 7.5 Build libyang1 which is required for frr 7.5	2020-12-29 03:44:49 -08:00
Guohan Lu	ed58684e36	[docker-frr]: add static ipv6 loopback route to allow bgp to advertise prefix frr does not advertise route if local route is not reachable, as a result loopback route /64 is not advertised to the neighbors. Add static route allows frr to advertise the route to its peers Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-12-28 10:34:34 -08:00
pavel-shirshov	fd87ba0aee	[bgpcfgd]: Add on-match next rule for set ipv6 next-hop prefer-global (#6011 ) * Add 'on-match next' after every 'set ipv6 next-hop prefer-global' * Check that 'set ipv6 next-hop prefer-global' rule has 'on-match' next	2020-11-24 08:33:31 -08:00
pavel-shirshov	5df8af5378	[TSA]: Fix TSC. Avoid 'Not consistent' state (#5968 )	2020-11-23 09:30:39 -08:00
lguohan	4d3eb18ca7	[supervisord]: use abspath as supervisord entrypoint (#5995 ) use abspath makes the entrypoint not affected by PATH env. Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-22 21:18:44 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
pavel-shirshov	af654944bd	[bgp]: Update TSA functionality (#5906 ) Fixed TSA bugs: 1. TSA didn't advertise Loopback ipv6 address 2. TSA and TSB changed BGP dynamic and BGP monitors sessions - How to verify it Build an image and run on your DUT. ``` admin@str-s6100-acs-1:~$ TSA System Mode: Normal -> Maintenance admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path > 10.1.0.32/32 0.0.0.0 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> fc00:1::/64 :: 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ TSB System Mode: Maintenance -> Normal ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-13 17:54:20 -08:00
judyjoseph	f2b22b5cd1	[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table (#5874 ) Reintroduce #5760, along with the fix needed in the template file for python3 compatibility.	2020-11-10 09:34:56 -08:00
judyjoseph	b5121dcfd4	Revert "[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760 )" (#5871 ) This reverts commit `c972052594`.	2020-11-09 14:30:13 -08:00
judyjoseph	c972052594	[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760 ) - Why I did it Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table. Additionally to address the review comment #5520 (comment) Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up. Updates to the internal tests data + add all of it to template tests. - How I did it Updated the APIs and the template files. - How to verify it Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()	2020-11-09 11:10:10 -08:00
Longxiang Lyu	385dfc4921	[monit] Fix status error due to shebang change (#5865 ) lldpmgrd, bgpcfgd, and bgpmon are reported error status not running due to recent change of shebang to use `Python3`. Modifying the argument of `process_checker` to follow this change. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2020-11-09 01:52:22 -08:00
pavel-shirshov	cdcd20a7b5	[BGP]: Convert ip address to network address for the LOCAL_VLAN filter (#5832 ) * [BGP]: Convert ip address to network address for the LOCAL_VLAN prefix filter	2020-11-06 17:47:08 -08:00
pavel-shirshov	13f8e9ce5e	[bgpcfgd]: Convert bgpcfgd and bgpmon to python3 (#5746 ) * Convert bgpcfgd to python3 Convert bgpmon to python3 Fix some issues in bgpmon * Add python3-swsscommon as depends * Install dependencies * reorder deps Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-05 10:01:43 -08:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
judyjoseph	6088bd59de	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-10-28 16:41:27 -07:00
pavel-shirshov	c94f93f046	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-22 11:04:21 -07:00
pavel-shirshov	812e1a3489	[bgp]: Enable next-hop-tracking through default (#5600 ) - Why I did it FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality. That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default` - How I did it Put `ip nht resolve-via-default` into the config - How to verify it Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-10-13 22:21:28 -07:00
abdosi	70528f7460	[Multi-asic] Fixed Default Route to be BGP (#5548 ) Learned and not docker default route for multi-asic platforms. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-05 22:54:47 -07:00
pavel-shirshov	ffae82f8be	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-02 10:06:04 -07:00
Tamer Ahmed	6754635010	[cfggen] Make Jinja2 Template Python 3 Compatible Jinja2 templates rendered using Python 3 interpreter, are required to conform with Python 3 new semantics. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-30 07:07:43 -07:00
Nazarii Hnydyn	79bda7d0d6	[monit]: Fix process checker. (#5480 ) Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2020-09-29 17:23:09 -07:00
arlakshm	e3a0feaa47	Vtysh support for multi asic (#5479 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-09-29 12:39:53 -07:00
Guohan Lu	e412338743	Revert "[bgp] Add 'allow list' manager feature (#5309 )" This reverts commit `6eed0820c8`.	2020-09-28 22:00:29 -07:00
pavel-shirshov	6eed0820c8	[bgp] Add 'allow list' manager feature (#5309 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-09-27 10:47:43 -07:00
gechiang	43a8368874	make bgpmon autorestart enabled by supervisord (#5460 )	2020-09-25 10:25:11 -07:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Joe LeVeque	3987cbd80a	[sonic-utilities] Build and install as a Python wheel package (#5409 ) We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3. Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package. - How I did it - Build and install sonic-utilities as a Python package - Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel - Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications - Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable Submodule updates: * src/sonic-utilities aa27dd9...2244d7b (5): > Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122) > [consutil] Display remote device name in show command (#1120) > [vrf] fix check state_db error when vrf moving (#1119) > [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117) > Update to make config load/reload backward compatible. (#1115) * src/sonic-ztp dd025bc...911d622 (1): > Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)	2020-09-20 20:16:42 -07:00
gechiang	128def6969	Add bgpmon to be started as a new daemon under BGP docker (#5329 ) * Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BGP docker * Added bgpmon to be monitored by Monit so that if it crashed, it gets alerted * use console_scripts entry point to package bgpmon	2020-09-20 14:32:09 -07:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
Prince Sunny	4338d8293f	Skip vnet-vxlan interfaces from generating networks (#5251 ) * Skip Vnet interface from generating networks	2020-08-27 14:14:04 -07:00
Tamer Ahmed	a10c5bfd02	[frr] Reduce Calls to SONiC Cfggen (#5176 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to two calls during startup when starting frr service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:47:42 -07:00
joyas-joseph	f0dfe36953	[docker-fpm-frr]: Upgrade docker-fpm-frr to buster (#4920 ) Verify that /etc/apt/sources.list points to buster using docker exec bgp cat /etc/apt/sources.list BGP neighborship is established. root@sonic:~# show ip bgp summary IPv4 Unicast Summary: BGP router identifier 10.1.0.1, local AS number 65100 vrf-id 0 BGP table version 1 RIB entries 1, using 184 bytes of memory Peers 1, using 20 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 6.1.1.1 4 100 96 96 0 0 0 01:32:04 0 Total number of neighbors 1 root@sonic:~# Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>	2020-07-29 14:19:03 -07:00
pavel-shirshov	89184038fd	[docker-fpm-frr]: Start bgpd after zebra was started (#5038 ) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure / if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](`88351c8f6d/lib/zclient.c (L654)`) and found that this function will return the exit code which the function gets from [zclient_send_message()](`88351c8f6d/lib/zclient.c (L266)`) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](`88351c8f6d/lib/zclient.c (L269)`) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](`88351c8f6d/lib/zclient.c (L277)`) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. - How I did it* I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.	2020-07-25 03:48:47 -07:00
anish-n	da017f4ec9	[bgpcfgd]: Add Vlan prefix list to the FRR templates (#5005 ) add the Vlan prefix list to the FRR templates	2020-07-21 19:26:19 -07:00
arlakshm	97fa2c087b	"[config]: Multi ASIC loopback changes (#4895 ) Resubmitting the changes for (#4825) with fixes for sonic-bgpcdgd test failures Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-07-12 18:08:51 +00:00
Guohan Lu	f8da3e4c69	Revert "[config]: Loopback Interface changes for multi ASIC devices (#4825 )" This reverts commit `cae65a451c`.	2020-07-12 18:08:51 +00:00

1 2 3

128 Commits