Commit Graph

176 Commits

Author SHA1 Message Date
abdosi
c6d1dae741
Fix the Loopback0 IPv6 address of LC's in chassis not reachable from (#16026)
What I did:
Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer devices.

Why I did:
For Ipv6 Loopback0 address we only advertise /64 subnet to the peer devices. However, in case of chassis each LC will have it own /128 address of that /64 subnet . Since this /128 address does not get advertised peer devices can-not ping/reach the LC's loopback0.

How I fix:
Advertise /128 Loopback0 Ipv6 address only between i-BGP peers. This way even though /64 is advertised to e-BGP peer devices when packet reaches any of LC's it can reach the appropriate LC's.

How I verify:
Manual verification
UT added for same.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-08-06 22:36:33 -07:00
guangyao6
9567c06570
Add BGP configuration for BGPSentinel peer (#15714)
Why I did it
For route registry service, in order to block hijacked routes, IBGP session needs to be set up from BGP sentinel service to SONiC, and BGP sentinel service advertise the same route with higher local-preference and no export community. So that SONiC takes the route from BGP sentinel as the best path and does not advertise the route to EBGP peers.
In order to do that, new route-maps are needed. So this change adds a new set of templates, keeping BGPSentinel peers out of the other templates.

Work item tracking
Microsoft ADO (number only): 24451346
How I did it
Add sentinel_community in constants.yml, route from BGPSentinel do not match this community will be denied.
Add support to convert BGPSentinel related configuration in the BGPPeerPassive element of the minigraph to a new BGP_SENTINELS table in CONFIG_DB
Add a new set of "sentinels" templates to docker-fpm-frr
Add a new BGP peer manager to bgpcfgd, to add neighbors from the BGP_SENTINELS table using the "sentinels" templates
Add a test case for minigraph.py, making sure the BGPSentinel and BGPSentinelV6 elements create BGP_SENTINELS DB entry.
Add a set of test cases for the new sentinels templates in sonic-bgpcfgd tests.
Add sonic-bgp-sentinel.yang and a set of testcases for the yang file.

How to verify it
Testcases and UT newly added would pass.
Setup IPv4 and IPv6 BGPSentinel services in minigraph, and load minigraph, show CONFIG_DB and "show runningconfig bgp", configuration would be loaded successfully.
Using t1-lag topo and setup IBGP session from BGPSentinel to SONiC loopback address, IBGP session would up.
Advertise route from BGPSentinel to T1 with sentinel_community, higher local-preference and no-export communiyt. In T1, show bgp route, the result is "Not advertise to any EBGP peer".
Withdraw the route in BGPSentinel, in T1, route would advertise to EBGP peers.
Advertise route from T1 that does not match sentinel_community, in T1, would not see the route in show bgp route.
2023-07-21 09:32:29 +08:00
nmoray
f978b2bb53
Timezone sync issue between the host and containers (#14000)
#### Why I did it
To fix the timezone sync issue between the containers and the host. If a certain timezone has been configured on the host (SONIC) then the expectation is to reflect the same across all the containers.

This will fix [Issue:13046](https://github.com/sonic-net/sonic-buildimage/issues/13046).

For instance, a PST timezone has been set on the host and if the user checks the link flap logs (inside the FRR), it shows the UTC timestamp. Ideally, it should be PST.
2023-06-25 16:36:09 -07:00
abdosi
6139c525d2
updated internal route policy for chassis-packet (#15349)
What I did:

Workaround for the issue seen here : FRRouting/frr#13682
It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state

Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route
- Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering)
- Level 2 Loopback4096 over backend port-channels next-hops

For VOQ chassis there is no e-BGP peer (connected route via bgp )  resolution as route is added as Static route by orchagent over Ethernet-IB.

Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2.

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507

How I verify:
Functional Verification manually
Updated UT.
We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-06-07 09:17:44 -07:00
Tejaswini Chadaga
8058550c09
[bgp]: Add sudo check for TSA/B/C command execution (#15288)
TSA/B/C scripts invoke commands that require root permissions. If the user does not have sudo permissions, the scripts today execute until the command and throw a backtrace with error at the specific command. Added a check to ensure the operations check for root permissions upfront.
2023-06-03 23:47:26 +02:00
Baorong Liu
acb423b255
[staticroutebfd]fix an issue on deleting a non-bfd static route (#15269)
* [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route

Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
2023-06-02 11:46:56 -07:00
qiwang4
359b80e012
[master]staticroutebfd process implementation (#13789)
* [BFD] staticroutebfd implementation
* To enable the BFD for static route

HLD: sonic-net/SONiC#1216
2023-05-26 16:32:05 -07:00
Tejaswini Chadaga
4e60f0d563
Template change for BGP monitors on T2 (#14844)
Why I did it
To support BGPMon sessions from each T2 linecard ASIC

Work item tracking
Microsoft ADO (number only): 17873174
How I did it
Added change in BGPMon configuration to use Loopback4096 as source interface, since this has a unique IP per ASIC.

How to verify it
Tested by manually setting up BGPMon session on T2 LC and verified that Loopback4096 could be used as source
2023-05-09 13:40:00 -07:00
abdosi
9b8b4e6e4d
[bgp/TSA]: Fixed the internal peer route-map policy (#14804)
What I did:
In FRR command update source <interface-name> is not at address-family level. Because of this
internal peer route-map for ipv6 were getting applied to ipv4 address family. As a result
TSA over iBGP for Ipv6 was not getting applied.

How I verify:

Manual Verification of TSA over both ipv4 and ipv6 after fix works fine.
Updated UT for this.

Added sonic-mgmt test gap: sonic-net/sonic-mgmt#8170

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-05-05 13:55:05 -07:00
Tejaswini Chadaga
ca224863cb
Changes to support TSA from supervisor (#14691)
Why I did it
Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module

Work item tracking
Microsoft ADO (number only): 17826134
How I did it
When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards

TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this)

Changes also include no-stats option with TSC for quick retrieval of the current system isolation state

This PR also reverts changes in #11403

How to verify it
These changes have a dependency on sonic-net/sonic-utilities#2701 for testing

Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard
Verify that all routes are withdrawn from eBGP neighbors on all linecards
Run TSB from supervisor module and ensure transition to Normal mode on each linecard
Verify that all routes are re-advertised from eBGP neighbors on all linecards
Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards
2023-04-28 16:28:06 +08:00
Stepan Blyshchak
04099f075d
[BGP] support BGP pending FIB suppression (#12853)
Signed-off-by: Stepan Blyschak stepanb@nvidia.com

DEPENDS: #12852

Why I did it
To support BGP pending FIB suppression.

How I did it
I backported patches from FRR 8.4 feature that allows communicating ASIC route status back to FRR.
Also, added a new field in DEVICE_METADATA YANG model table. Added UT for YANG model changes.

How to verify it
Run on the switch.
2023-04-20 19:56:13 +08:00
Tejaswini Chadaga
f80bf7783d
Fix VOQ_CHASSIS_V6_PEER route-map config (#14055)
* Fix typo in VOQ_CHASSIS_V6_PEER route-map config

* Updated UT files with the changed config
2023-03-03 09:28:57 -08:00
Stepan Blyshchak
68e1079202
[FRR] Switch to dplane_fpm_nl plugin instead of fpm (#12852)
Why I did it
dplane_fpm_nl is a new FPM implementation in FRR. The old plugin fpm will not have any new features implemented. Usage of the new plugin gives us ability to use BGP suppression feature and next hop groups in the future.

How I did it
Switch to dplane_fpm_nl zebra plugin from old fpm plugin which is not supported anymore
Remove stale patches for old fpm plugin and add similar patches for dplane_fpm_nl

How to verify it
Build and run on the switch.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-06 09:38:39 -08:00
Zain Budhwani
2068a2697a
Change bgp notification leaf name and mem_usage leaf type (#13012)
#### Why I did it

Improve naming convention for bgp notification events and change type of leaf for sonic-events-host mem usage from uint64 to decimal64

#### How I did it

Replace "-" with "_"

Replace uint64 with decimal64

#### How to verify it

Run yang model unit tests

#### Description for the changelog

Change YANG model leaf naming convention for bgp notification
2023-01-24 15:47:32 -08:00
Junchao-Mellanox
2126def04e
[infra] Support syslog rate limit configuration (#12490)
- Why I did it
Support syslog rate limit configuration feature

- How I did it
Remove unused rsyslog.conf from containers
Modify docker startup script to generate rsyslog.conf from template files
Add metadata/init data for syslog rate limit configuration

- How to verify it
Manual test
New sonic-mgmt regression cases
2022-12-20 10:53:58 +02:00
Longxiang Lyu
d2ab55cc15
[dualtor] Let T0 delay 10 seconds before sending BGP updates (#12996)
Why I did it
To ensure, that after a BGP startup, dualtor T0 receives BGP updates before sending out BGP updates.
Please refer to sonic-net/SONiC#1161 for more details.

How I did it
add coalesce-time 10000 to the frr bgp startup config.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2022-12-15 22:14:46 +00:00
Arnaud
9d3814045b
[docker-fpm-frr]: Add unified-split mode to routing config (#11938)
- Why I did it
The values for config_db "docker_routing_config_mode" are:

separated: FRR config generated from ConfigDB, each FRR daemon has its own config file
unified: FRR config generated from ConfigDB, single FRR config file
split: FRR config not generated from ConfigDB, each FRR daemon has its own config file
This commit adds:
split-unified: FRR config not generated from ConfigDB, single FRR config file

- How I did it
In docker_init.sh, when split-unified is used, the FRR configs are not generated
from ConfigDB. What's more, "service integrated-vtysh-config" is configured in vtysh.conf.

- How to verify it
FRR config not overwritten when FRR container starts.

Signed-off-by: Arnaud le Taillanter <a.letaillanter@criteo.com>
2022-11-14 10:37:48 -08:00
Caitlin Choate
66f1cc458d
Bugfix #9739: Support when 'bgp_asn' is set to 'None', 'Null', or missing. (#12588)
bgpd.main.conf.j2: bugfix-9739
* Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing.

How I did it
Include a conditional statement to avoid configuring bgp in FRR when 'bgp_asn' is missing or set to 'None' or 'Null'

How to verify it
Configure 'bgp_asn' as 'None', 'Null' or have it missing from configurations and verify that /etc/frr/bgpd.conf does not have invalid bgp configurations like 'router bgp None'

Description for the changelog
Update bgpd.main.conf.j2 to gracefully handle the bgp configuration cases for when 'bgp_asn' is set to 'None', 'Null', or missing for bugfix 9739.

Signed-off-by: cchoate54@gmail.com
2022-11-08 16:53:14 -08:00
tjchadaga
763d3dc29d
Allow TSA on ibgp sessions between linecards on packet chassis (#12589) 2022-11-03 08:54:33 -07:00
Zain Budhwani
09fe3f467f
Add Structured Events w/ YANG Models (#12270)
Add events for dhcp-relay, bgp, syncd, & kernel.
2022-10-09 20:23:31 -07:00
Zain Budhwani
fd6a1b0ce2
Add events to host and create rsyslog_plugin deb pkg (#12059)
Why I did it

Create rsyslog plugin deb for other containers/host to install
Add events for bgp and host events
2022-09-21 09:20:53 -07:00
Renuka Manavalan
31e750ee0b
Fix PR build failure (#11973)
Some PR builds fails to find this file. Remove it temporarily until we root cause it
2022-09-06 15:13:05 -07:00
Zain Budhwani
6a54bc439a
Streaming structured events implementation (#11848)
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
2022-09-03 07:33:25 -07:00
Hasan Naqvi
2d4ab9e979
Bullseye frr (#11777)
Why I did it
Migrate FRR to bullseye

How I did it
Makefile and docker config changes to refer to bullseye instead of buster.

How to verify it
Build bullseye frr docker.

Co-authored-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
2022-08-21 17:04:47 -07:00
tjchadaga
cdd2786117
Fix for TSA error logging on multi-asic (#11519) 2022-07-30 22:16:58 -07:00
tjchadaga
077a537b14
Log message fix for TSB (#11441) 2022-07-14 12:26:58 -07:00
tjchadaga
849eb4bf32
Changes to persist TSA/B state across reloads (#11257) 2022-07-12 00:22:48 -07:00
Sudharsan Dhamal Gopalarathnam
14f6f70ca3
[BGP]Adding configuration knob to allow advertise Loopback ipv6 /128 prefix (#10958)
* [BGP]Adding configuration knob to allow advertise Loopback ipv6 /128 prefix
By default when IPv6 address is configured with /128 as subnet mask in Loopback0 interface, it will be advertised as prefix with /64 subnet.
To control this behavior a new field 'bgp_adv_lo_prefix_as_128' is introduced in DEVICE_METADATA table which when set to true will advertise prefix with /128 subnet as it is.
2022-06-06 08:51:04 -07:00
Kalimuthu-Velappan
bc30528341
Parallel building of sonic dockers using native dockerd(dood). (#10352)
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.

    docker-database:latest
    docker-swss:latest

When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.

This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.

	docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag

The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
2022-04-28 08:39:37 +08:00
arlakshm
fd22635de0
[chassis][bgp] create v4 and v6 peer group for VoQ internal neighbors (#9693)
Why I did it
In the recent minigraph changes we add separate BGP session configuration for V4 and V6 internal VoQ neighbors.
This PR is adding different Peer groups for V4 and V6 neighbors

How I did it
Add VOQ_CHASSIS_V4_PEER and VOQ_CHASSIS_V6_PEER groups
Add extra Unit tests

How to verify it

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2022-02-24 11:21:26 -08:00
abdosi
e44a40cc3b
Updated Internal BGP Templates for chassis packet (#9674)
Fixes: https://github.com/Azure/sonic-buildimage/issues/9610
2022-02-08 09:36:32 -08:00
Longxiang Lyu
49a036e90c
Add dualtor TSA/B/C support (#9726)
Why I did it
Add TSA/B/C dualtor support

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
For TSA, toggle all the mux to standby if the device type is dualtor and there are active mux ports.
For TSC, add mux status output.

How to verify it
Run TSA/B/C on a dualtor setup
2022-01-25 10:50:29 +08:00
Saikrishna Arcot
bd479cad29 Create a docker-swss-layer that holds the swss package.
This is to save about 50MB of disk space, since 6 containers
individually install this package.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-01-06 09:26:55 -08:00
abdosi
6c0da4bcf0
[bgp] Enable BGP Graceful Restart based on device role (#9486)
What I did:
Updated Jinja Template to enable BGP Graceful Restart based on device role. By default it will be enable only if the device role type is TorRouter.

Why I did:-
By default FRR is configured in Graceful Helper mode. Graceful Restart is needed on T0/TorRouter only since the device can go for warm-reboot. For T1/LeafRouter it need to be in Helper mode only
2021-12-13 10:14:50 -08:00
abdosi
f501311f11
Updated BGP Template for Chassis/Multi-asic (#9291)
Updated BGP Template for the case:
    
   1. For Packet Chassis do not advertise Loopback4096 address into BGP as there is Static Route for same. 
       Having this route in BGP causes two level of recursion in Zebra and cause assert in Zebra 
       when there are many nexthop involved
 
   2. Advertise only P2P Connected IP's into BGP (External Peers). For Packet chassis we have backend IP Interface subnet and if 
        they get advertised into BGP then it also causes recursion
2021-12-06 09:36:24 -08:00
vganesan-nokia
78de10713c
[voq-chassis][bgpcfg] VOQ_BGP_CHASSIS_NEIGHBORS timers default (#8455)
The BGP_VOQ_CHASSIS_NEIGHBOR keepalive and holdtime timers are
configured similar to general neighbors. Changes are done to configure
BGP_VOQ_CHASSIS_NEIGHBOR timers similar to BGP_INTENAL_NEIGBOR since voq
chassis bgp neighbors are similar to bgp internal neighbors in
multi-asic. As it is done for bgp internal neighbors, the keepalive and
holdtime timers are set to 3 and 10 seconds respectively. Also similar
to bgp internal neighbors, connection retry timer is also configured for
voq chassis bgp neighbors.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
2021-11-30 12:10:27 -08:00
arlakshm
5830852832
remove staticd.conf.j2 (#9182)
Why I did it
resolves #8979 and #9055

How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-11-24 15:32:16 -08:00
abdosi
919b3e5cdf
[chassis-packet] Fixed BGP Internal Peer template (#9106)
What I did:

Fix the typo in Internal Peer Group template for Packet-based Chassis.
Address Review comments of PR: [chassis-packet] minigraph parsing and BGP template changes #8966
- Static Route Parsing for Host
- Formatting of chassis port_config.ini
2021-10-29 11:02:38 -07:00
abdosi
3bb248bd67
[chassis-packet] minigraph parsing and BGP template changes (#8966)
1. Changes for Generation LC-Graph for packet-based chassis.
2. Added Support Ipv6 Peering on Loopback4096 for voq also
3. Updated asic topology yml files to be offset of slot
4. Made slot_num to take string slot<number> instead of number
5. Consolidated template_dpg_voq_asic.j2 into dpg_asic.j2
6. Remove Loopback4096 from asic topology and parse as dut invertory for
   multi-asic
7. Updated topo_facts parsing for asic topology_
8. Internal BGP Session rename from <VoqChassisInternal> to <ChassisInternal> and take switch_type as value.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-10-18 18:44:24 -07:00
vganesan-nokia
f9231723f9
[multiasic][voq][bgpconf] Fix for the issue of same BGP router id in all asics (#8049)
For multiasic, the back end asics use ip addresss of Loopback4096 for BGP router id. In VOQ multi-asic chassis there are no back end asics. All the asics are front end and the iBGP connections are established via Ethernet-IB of asics. Since these asics are not designated as BackEnd, the ip address of interface Loopback0 is used as BGP router id. Since the ip address of Loopback0 is same for all the asics in the line card, same router id is used for voq iBGP configurations and hence the iBGP connections are not established. Changes are done to fix this
2021-07-26 12:54:52 -07:00
Shi Su
8a48be9b74
Reduce route selection deferral timer for bgp graceful restart (#7533)
Why I did it
There are scenarios that End-of-RIB comes from a part of the peers arrives after reconciliation. In such scenarios, if the route selection deferral timer has the default value of 360 seconds, FRR would not set up routes and all routes would be removed after reconciliation. This PR reduces the route selection deferral timer so that at least routes to parts of the peers get restored at the point of reconciliation.

Fix #7488

How I did it
Reduce route selection deferral timer for bgp graceful restart to 15 seconds.
2021-07-26 10:16:19 -07:00
arlakshm
ef67ba5f6e
[multi-asic] fix network command for internal loopback (#7878)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
In the multi asic platforms all the ASIC are advertising the same IPv6 /64 network from Loopback4096.
Therefore, the IPv6 loopback address of backend asic is not learnt on the frontend asic.
Change the bgpd.conf.main.conf.j2 template file to advertise the Loopback4096 ipv6 address as /128
2021-06-24 12:02:01 -07:00
yozhao101
1a3cab43ac
[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-04 10:16:53 -07:00
xumia
9387350e19
Fix the type issue in rvtysh (#7648)
Why I did it
Change the type issue in the command rvtysh
change PARA/para to PARAM/param
2021-05-20 21:35:23 +08:00
abdosi
f27aa33e69
[muti-asic] Updated BGP community for Internal routes (#7617)
Following changes are done:

Internal routes are tagged with no-export instead of local-AS
Option to add User Define BGP community on top of no-export
2021-05-16 19:44:06 -07:00
xumia
56bdd750ab
Support readonly vtysh for sudoers (#7383)
Why I did it
Support readonly version of the command vtysh

How I did it
Check if the command starting with "show", and verify only contains single command in script.
2021-04-25 16:32:02 +08:00
Ze Gan
f77d719f7c
[docker-fpm-frr]: Add split mode to routing config (#7307)
For the split mode, the config files, like bgpd.conf, zebra.conf and so on, were provided by outside. But the docker_init.sh will overwrite the outside config files if restart bgp service.

How I did it
Add a split mode checking in docker_init.sh, if docker_routing_config_mode is split, don't overwrite the existing routing config files.

How to verify it
Set split mode in config db
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "docker_routing_config_mode": "split"
            ...
        }
    }
}
Replace your bgpd.conf to /etc/sonic/frr/bgpd.conf
Restart bgp service by sudo service bgp restart
The /etc/sonic/frr/bgpd.conf your provided shouldn't be overwritten

Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-04-23 10:16:20 -07:00
jmmikkel
43342b33b8
[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622)
This commit has following changes:

* Add templates and code to support VoQ chassis iBGP peers

* Add support to convert a new VoQChassisInternal element in the
   BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR 
   table in CONFIG_DB.
* Add a new set of "voq_chassis" templates to docker-fpm-frr
* Add a new BGP peer manager to bgpcfgd to add neighbors from the
  BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates.
* Add a test case for minigraph.py, making sure the VoQChassisInternal
  element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its
  value is "false".
* Add a set of test cases for the new voq_chassis templates in
  sonic-bgpcfgd tests.

Note that the templates expect the new
"bgp bestpath peer-type multipath-relax" bgpd configuration to be
available.

Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>
2021-04-16 11:11:32 -07:00
judyjoseph
1ad5dbeab6
Fixes for errors seen in staging devices (#7171)
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.

admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30

In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
2021-04-08 15:16:43 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00