sonic-buildimage

Author	SHA1	Message	Date
vganesan-nokia	78de10713c	[voq-chassis][bgpcfg] VOQ_BGP_CHASSIS_NEIGHBORS timers default (#8455 ) The BGP_VOQ_CHASSIS_NEIGHBOR keepalive and holdtime timers are configured similar to general neighbors. Changes are done to configure BGP_VOQ_CHASSIS_NEIGHBOR timers similar to BGP_INTENAL_NEIGBOR since voq chassis bgp neighbors are similar to bgp internal neighbors in multi-asic. As it is done for bgp internal neighbors, the keepalive and holdtime timers are set to 3 and 10 seconds respectively. Also similar to bgp internal neighbors, connection retry timer is also configured for voq chassis bgp neighbors. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>	2021-11-30 12:10:27 -08:00
arlakshm	5830852832	remove staticd.conf.j2 (#9182 ) Why I did it resolves #8979 and #9055 How I did it Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-11-24 15:32:16 -08:00
abdosi	919b3e5cdf	[chassis-packet] Fixed BGP Internal Peer template (#9106 ) What I did: Fix the typo in Internal Peer Group template for Packet-based Chassis. Address Review comments of PR: [chassis-packet] minigraph parsing and BGP template changes #8966 - Static Route Parsing for Host - Formatting of chassis port_config.ini	2021-10-29 11:02:38 -07:00
abdosi	3bb248bd67	[chassis-packet] minigraph parsing and BGP template changes (#8966 ) 1. Changes for Generation LC-Graph for packet-based chassis. 2. Added Support Ipv6 Peering on Loopback4096 for voq also 3. Updated asic topology yml files to be offset of slot 4. Made slot_num to take string slot<number> instead of number 5. Consolidated template_dpg_voq_asic.j2 into dpg_asic.j2 6. Remove Loopback4096 from asic topology and parse as dut invertory for multi-asic 7. Updated topo_facts parsing for asic topology_ 8. Internal BGP Session rename from <VoqChassisInternal> to <ChassisInternal> and take switch_type as value. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-10-18 18:44:24 -07:00
vganesan-nokia	f9231723f9	[multiasic][voq][bgpconf] Fix for the issue of same BGP router id in all asics (#8049 ) For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this	2021-07-26 12:54:52 -07:00
Shi Su	8a48be9b74	Reduce route selection deferral timer for bgp graceful restart (#7533 ) Why I did it There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation. Fix #7488 How I did it Reduce route selection deferral timer for bgp graceful restart to 15 seconds.	2021-07-26 10:16:19 -07:00
arlakshm	ef67ba5f6e	[multi-asic] fix network command for internal loopback (#7878 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> In the multi asic platforms all the ASIC are advertising the same IPv6 /64 network from Loopback4096. Therefore, the IPv6 loopback address of backend asic is not learnt on the frontend asic. Change the bgpd.conf.main.conf.j2 template file to advertise the Loopback4096 ipv6 address as /128	2021-06-24 12:02:01 -07:00
abdosi	f27aa33e69	[muti-asic] Updated BGP community for Internal routes (#7617 ) Following changes are done: Internal routes are tagged with no-export instead of local-AS Option to add User Define BGP community on top of no-export	2021-05-16 19:44:06 -07:00
jmmikkel	43342b33b8	[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622 ) This commit has following changes: * Add templates and code to support VoQ chassis iBGP peers * Add support to convert a new VoQChassisInternal element in the BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR table in CONFIG_DB. * Add a new set of "voq_chassis" templates to docker-fpm-frr * Add a new BGP peer manager to bgpcfgd to add neighbors from the BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates. * Add a test case for minigraph.py, making sure the VoQChassisInternal element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its value is "false". * Add a set of test cases for the new voq_chassis templates in sonic-bgpcfgd tests. Note that the templates expect the new "bgp bestpath peer-type multipath-relax" bgpd configuration to be available. Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>	2021-04-16 11:11:32 -07:00
Joe LeVeque	c651a9ade4	[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083 ) To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.	2021-03-27 21:14:24 -07:00
Shi Su	de64c4e34c	[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169 ) The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.	2021-03-27 11:36:56 -07:00
judyjoseph	9d9503e1fe	To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087 ) Why I did it It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later. In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer How I did it Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.	2021-03-17 23:14:38 -07:00
abdosi	30b6668b7d	Changes in FRR temapltes for multi-asic (#6901 ) 1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic. 2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output. 3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27 4. Enhancement to add mult_asic specific bgpd template generation unit test cases.	2021-02-26 17:05:15 -08:00
abdosi	a520cecb44	[multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848 ) Enable BBR config allowas-in 1 for internal peers Why I did: To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.	2021-02-25 23:15:02 -08:00
Guohan Lu	f7346cca32	[docker-fmp-frr]: remove blank lines in generated critical_process Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-01-27 19:41:59 -08:00
Shi Su	aab37b7f42	[FRR] Create a separate script to wait zebra to be ready to receive connections (#6519 ) The requirement for zebra to be ready to accept connections is a generic problem that is not specific to bgpd. Making the script to wait for zebra socket a separate script and let bgpd and staticd to wait for zebra socket.	2021-01-27 12:36:02 -08:00
Zhenhong Zhao	a171e6c5e4	[frrcfgd] introduce frrcfgd to manage frr config when frr_mgmt_framework_config is true (#5142 ) - Support for non-template based FRR configurations (BGP, route-map, OSPF, static route..etc) using config DB schema. - Support for save & restore - Jinja template based config-DB data read and apply to FRR during startup - How I did it - add frrcfgd service - when frr_mgmg_framework_config is set, frrcfgd starts in bgp container - when user changed the BGP or other related table entries in config DB, frrcfgd will run corresponding VTYSH commands to program on FRR. - add jinja template to generate FRR config file to be used by FRR daemons while bgp container restarted - How to verify it 1. Add/delete data on config DB and then run VTYSH "show running-config" command to check if FRR configuration changed. 1. Restart bgp container and check if generated FRR config file is correct and run VTYSH "show running-config" command to check if FRR configuration is consistent with attributes in config DB Co-authored-by: Zhenhong Zhao <zhenhong.zhao@dell.com>	2021-01-24 17:57:03 -08:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Shi Su	afee1a851c	[bgpd]: Check zebra is ready to connect when starting bgpd (#6478 ) Fix #5026 There is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within these 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. To avoid such a scenario, bgpd should start after zebra is ready to accept connections.	2021-01-19 00:23:36 -08:00
pavel-shirshov	83715cfc49	[bgpcfgd]: Support default action for "Allow prefix" feature (#6370 ) * Use 20 and 30 route-map entries instead of 2 and 3 for TSA * Added support for dynamic "Allow list" default action. Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2021-01-08 14:03:26 -08:00
Ubuntu	273846a412	FRR 7.5 Build libyang1 which is required for frr 7.5	2020-12-29 03:44:49 -08:00
Guohan Lu	ed58684e36	[docker-frr]: add static ipv6 loopback route to allow bgp to advertise prefix frr does not advertise route if local route is not reachable, as a result loopback route /64 is not advertised to the neighbors. Add static route allows frr to advertise the route to its peers Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-12-28 10:34:34 -08:00
pavel-shirshov	fd87ba0aee	[bgpcfgd]: Add on-match next rule for set ipv6 next-hop prefer-global (#6011 ) * Add 'on-match next' after every 'set ipv6 next-hop prefer-global' * Check that 'set ipv6 next-hop prefer-global' rule has 'on-match' next	2020-11-24 08:33:31 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
pavel-shirshov	af654944bd	[bgp]: Update TSA functionality (#5906 ) Fixed TSA bugs: 1. TSA didn't advertise Loopback ipv6 address 2. TSA and TSB changed BGP dynamic and BGP monitors sessions - How to verify it Build an image and run on your DUT. ``` admin@str-s6100-acs-1:~$ TSA System Mode: Normal -> Maintenance admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path > 10.1.0.32/32 0.0.0.0 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes' BGP table version is 6, local router ID is 10.1.0.32, vrf id 0 Default local pref 100, local AS 64601 Status codes: s suppressed, d damped, h history, valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> fc00:1::/64 :: 0 32768 i Total number of prefixes 1 admin@str-s6100-acs-1:~$ TSB System Mode: Maintenance -> Normal ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-13 17:54:20 -08:00
judyjoseph	f2b22b5cd1	[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table (#5874 ) Reintroduce #5760, along with the fix needed in the template file for python3 compatibility.	2020-11-10 09:34:56 -08:00
judyjoseph	b5121dcfd4	Revert "[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760 )" (#5871 ) This reverts commit `c972052594`.	2020-11-09 14:30:13 -08:00
judyjoseph	c972052594	[multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760 ) - Why I did it Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table. Additionally to address the review comment #5520 (comment) Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up. Updates to the internal tests data + add all of it to template tests. - How I did it Updated the APIs and the template files. - How to verify it Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()	2020-11-09 11:10:10 -08:00
pavel-shirshov	cdcd20a7b5	[BGP]: Convert ip address to network address for the LOCAL_VLAN filter (#5832 ) * [BGP]: Convert ip address to network address for the LOCAL_VLAN prefix filter	2020-11-06 17:47:08 -08:00
pavel-shirshov	13f8e9ce5e	[bgpcfgd]: Convert bgpcfgd and bgpmon to python3 (#5746 ) * Convert bgpcfgd to python3 Convert bgpmon to python3 Fix some issues in bgpmon * Add python3-swsscommon as depends * Install dependencies * reorder deps Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-05 10:01:43 -08:00
judyjoseph	6088bd59de	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-10-28 16:41:27 -07:00
pavel-shirshov	c94f93f046	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-22 11:04:21 -07:00
pavel-shirshov	812e1a3489	[bgp]: Enable next-hop-tracking through default (#5600 ) - Why I did it FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality. That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default` - How I did it Put `ip nht resolve-via-default` into the config - How to verify it Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-10-13 22:21:28 -07:00
pavel-shirshov	ffae82f8be	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-02 10:06:04 -07:00
Tamer Ahmed	6754635010	[cfggen] Make Jinja2 Template Python 3 Compatible Jinja2 templates rendered using Python 3 interpreter, are required to conform with Python 3 new semantics. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-30 07:07:43 -07:00
Guohan Lu	e412338743	Revert "[bgp] Add 'allow list' manager feature (#5309 )" This reverts commit `6eed0820c8`.	2020-09-28 22:00:29 -07:00
pavel-shirshov	6eed0820c8	[bgp] Add 'allow list' manager feature (#5309 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-09-27 10:47:43 -07:00
gechiang	43a8368874	make bgpmon autorestart enabled by supervisord (#5460 )	2020-09-25 10:25:11 -07:00
gechiang	128def6969	Add bgpmon to be started as a new daemon under BGP docker (#5329 ) * Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BGP docker * Added bgpmon to be monitored by Monit so that if it crashed, it gets alerted * use console_scripts entry point to package bgpmon	2020-09-20 14:32:09 -07:00
Joe LeVeque	5b3b4804ad	[dockers][supervisor] Increase event buffer size for dependent-startup (#5247 ) When stopping the swss, pmon or bgp containers, log messages like the following can be seen: ``` Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36 Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37 ``` This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100. Resolves https://github.com/Azure/sonic-buildimage/issues/5241	2020-09-08 23:36:38 -07:00
Prince Sunny	4338d8293f	Skip vnet-vxlan interfaces from generating networks (#5251 ) * Skip Vnet interface from generating networks	2020-08-27 14:14:04 -07:00
Tamer Ahmed	a10c5bfd02	[frr] Reduce Calls to SONiC Cfggen (#5176 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to two calls during startup when starting frr service. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:47:42 -07:00
pavel-shirshov	89184038fd	[docker-fpm-frr]: Start bgpd after zebra was started (#5038 ) fixes https://github.com/Azure/sonic-buildimage/issues/5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure / if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](`88351c8f6d/lib/zclient.c (L654)`) and found that this function will return the exit code which the function gets from [zclient_send_message()](`88351c8f6d/lib/zclient.c (L266)`) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](`88351c8f6d/lib/zclient.c (L269)`) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](`88351c8f6d/lib/zclient.c (L277)`) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. - How I did it* I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.	2020-07-25 03:48:47 -07:00
anish-n	da017f4ec9	[bgpcfgd]: Add Vlan prefix list to the FRR templates (#5005 ) add the Vlan prefix list to the FRR templates	2020-07-21 19:26:19 -07:00
arlakshm	97fa2c087b	"[config]: Multi ASIC loopback changes (#4895 ) Resubmitting the changes for (#4825) with fixes for sonic-bgpcdgd test failures Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-07-12 18:08:51 +00:00
Guohan Lu	f8da3e4c69	Revert "[config]: Loopback Interface changes for multi ASIC devices (#4825 )" This reverts commit `cae65a451c`.	2020-07-12 18:08:51 +00:00
arlakshm	002335a3d5	[config]: Loopback Interface changes for multi ASIC devices (#4825 ) * Loopback IP changes for multi ASIC devices multi ASIC will have 2 Loopback Interfaces - Loopback0 has globally unique IP address, which is advertised by the multi ASIC device to its peers. This way all the external devices will see this device as a single device. - Loopback4096 is assigned an IP address which has a scope is within the device. Each ASIC has a different ip address for Loopback4096. This ip address will be used as Router-Id by the bgp instance on multi ASIC devices. This PR implements this change for multi ASIC devices Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-07-12 18:08:51 +00:00
pavel-shirshov	2b137fb540	Tests of FRR templates which rendered by sonic-cfggen (#4875 ) * Tests of FRR templates which rendered by sonic-cfggen	2020-07-12 18:08:51 +00:00
pavel-shirshov	d592e9b0f8	Tests for bgpcfgd templates (#4841 ) * Tests for bgpcfgd templates	2020-06-25 14:54:02 -07:00
pavel-shirshov	0d863c39ac	[bgpcfgd]: make a package for bgpcfgd (#4813 )	2020-06-20 21:01:24 -07:00

1 2

56 Commits