Commit Graph

669 Commits

Author SHA1 Message Date
abdosi
c6c8c934e1
[buildfix-201911] Fix the snmp docker build error. (#7452)
Issue is get_pip.py is moved to pip 21.1 (https://github.com/pypa/get-pip/commits/main) which is not compatible with 3.6.
Issue of pip itself is fixed as part of 21.1.1 in pip community (pypa/pip#9835).
However get-pip.py is still not updated to latest pip. Also get.pip.py does not support python 3.6 version explicitly (pypa/get-pip#88)

Step 15/29 : RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6
 ---> Running in bece31f49267
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 1891k  100 1891k    0     0  9564k      0 --:--:-- --:--:-- --:--:-- 9600k
Traceback (most recent call last):
  File "<stdin>", line 24298, in <module>
  File "<stdin>", line 139, in main
  File "<stdin>", line 115, in bootstrap
  File "<stdin>", line 96, in monkeypatch_for_cert
  File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/commands/__init__.py", line 9, in <module>
  File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/base_command.py", line 12, in <module>
  File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/cli/cmdoptions.py", line 30, in <module>
  File "/tmp/tmp5fnxrz0a/pip.zip/pip/_internal/utils/hashes.py", line 2, in <module>
ImportError: cannot import name 'NoReturn'
The command '/bin/sh -c curl https://bootstrap.pypa.io/get-pip.py | python3.6' returned a non-zero code: 1
How I did:

Got the file from https://github.com/pypa/get-pip/tree/21.0 and added to the buildimage
pin pip to the previous release 21.0.1. (Similar is done in other public repos eg: grpc/grpc-java#8115)

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-04-28 01:28:55 -07:00
yozhao101
aeae87d1b5
[201911][Monit] Use VLAN name to differentiate each Monit service of dhcp_relay (#7378)
#### Why I did it
Since we will have multiple `dhcrelay` processes if there exists different VLANs in the table `VLAN_INTERFACE` of `CONIFG_DB`, 
we should use unique service name for each `dhcrelay` process in Monit configuration file. Otherwise, Monit service will fail to work.

#### How I did it
I append the VLAN name to the end of each service name such that they are unique.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-04-22 18:04:29 -07:00
yozhao101
528543bc6a
[201911][Monit] Monitor critical processes in radv and dhcp_relay containers. (#7340)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to monitor critical processes in router advertiser and dhcp_relay containers by Monit.

How I did it
Router advertiser container only ran on T0 device and the T0 device should have at least one VLAN interface
which was configured an IPv6 address. At the same time, router advertiser container will not run on devices of which
the deployment type is 8.

As such, I created a service which will dynamically generate Monit configuration file of router advertiser from a
template.

Similarly Monit configuration file of dhcp_relay was also generated from a template since the number of dhcrelay process in dhcp_relay container is depended on number of VLANs.

How to verify it
I verified this implementation on a DuT.
2021-04-16 08:40:06 -07:00
judyjoseph
b9f8348a5d Fixes for errors seen in staging devices (#7171)
With the latest 201911 image, the following error was seen on staging devices with TSB command ( for both single asic, multi asic ). Though this err message doesn't affect the TSB functionality, it is good to fix.

admin@STG01-0101-0102-01T1:~$ TSB
BGP0 : % Could not find route-map entry TO_TIER0_V4 20
line 1: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 permit 20
% Could not find route-map entry TO_TIER0_V4 30
line 2: Failure to communicate[13] to zebra, line: no route-map TO_TIER0_V4 deny 30

In addition, in this PR I am fixing the message displayed to user when there are no BGP neighbors configured on that BGP instance. In multi-asic device there could be case where there are no BGP neighbors configured on a particular ASIC.
2021-04-08 15:30:46 -07:00
Tamer Ahmed
86ea554d4a [radv] Fix Script Name Change (#7254)
PR https://github.com/Azure/sonic-buildimage/pull/4599 changed startup
script name from wait_for_intf.sh.j2 to wait_for_link.sh.j2, however
when PR https://github.com/Azure/sonic-buildimage/pull/5178 was cherry-
picked, the script name was not changed to wait_for_link.sh.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-04-08 09:56:31 -07:00
Joe LeVeque
72b32a96fc
[201911][dockers][supervisor] Increase event buffer size for process exit listener (#7106)
Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 201911 branch.

#### Why I did it

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-29 10:07:43 -07:00
judyjoseph
c15b5ea339 To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087)
Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
2021-03-17 23:16:44 -07:00
Tamer Ahmed
7c5f0ff316
Start DHCP Relay When Helpers IPs Are Available (#6961) (#7059)
It is possible to have DHCP relay configuration with no servers/
helpers which result in DHCP container to crash. This PR fixes this
issue by not starting DHCP relay for vlans with no DHCP helpers.

resolves: #6931
closes: #6931
Do not add program group for dhcp relay with not dhcp helpers

Unit test
2021-03-15 14:43:50 -07:00
trzhang-msft
a0b824f83e
[docker-dhcp-relay]: add -si support in dhcp docker template (#7054) 2021-03-15 09:21:32 -07:00
Ze Gan
b73d5a659e [docker-ptf]: Add teamd dependency to ptf (#6994)
Signed-off-by: Ze Gan <ganze718@gmail.com>
2021-03-10 10:50:17 -08:00
Qi Luo
b12383013f [build]: Fix get-pip 2.7 url according to upstream announcement (#6999)
ref: https://bootstrap.pypa.io/2.7/get-pip.py

The URL you are using to fetch this script has changed, and this one will no
longer work. Please use get-pip.py from the following URL instead:

    https://bootstrap.pypa.io/pip/2.7/get-pip.py
2021-03-10 09:51:31 -08:00
abdosi
ab05a2f58a
Add support for BGP Monitors on multi asic SONiC platforms. (#6977)
This PR is cherry-pick of master
https://github.com/Azure/sonic-buildimage/pull/6920

Why I did it
Add support for BGP Monitors on multi asic SONiC platforms.

How I did it
On multi ASIC SONiC platforms, BGP monitor session will be established from Backend ASIC.
To achieve this following changes are done

Add BGP monitor configuration on the backend ASIC.
The BGP monitor configuration is present in the DPG of the device in minigraph.xml of multi-ASIC device, so this configuration will be added to the config_db of the host, when the minigraph is loaded.
To add configuration for this in the Backend ASIC, a new class MultiAsicBgpMonCfg is added to the hostcfgd service to update the config_db of the backend ASIC when the BGP_MONITOR table of the host config_db is updated.
This way incremental BGP_MONITOR configuration can also be handled.

Changes to establish BGP session with bgp monitor.

Add route in host main routing table to go to one of pre-define backend asic
Add IP table rule on front asic to mark the BGP packets with destination as IPv4 Loopback.
Add IP rule in front asic namespace to match mark BGP packet and lookup default table
Program the default route in FrontEnd asic name space docker default table as part of start.sh of the BGP container.
It need to be done as part of start.sh otherwise FRR default route will get over-written.
How to verify it

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: Arvind <arlakshm@microsoft.com>
2021-03-06 21:21:52 -08:00
abdosi
9dc285ab05 Changes in FRR temapltes for multi-asic (#6901)
1. Made the command next-hop-self force only applicable on back-end asic bgp. This is done so that BGPL iBGP session running on backend can send e-BGP learn nexthop. Back end asic FRR is able to recursively resolve the eBGP nexthop in its routing table since it knows about all the connected routes advertise from front end asic.

2. Made all front-end asic bgp use global loopback ip (Loopback0) as router id and back end asic bgp use Loopbacl4096 as ruter-id and originator id for Route-Reflector. This is done so that routes learnt by external peer do not see Loopback4096 as router id in show ip bgp <route-prerfix> output.

3. To handle above change need to pass Loopback4096 from BGP manager for jinja2 template generation. This was missing and this change/fix is needed for this also https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/frr/bgpd/templates/dynamic/instance.conf.j2#L27

4. Enhancement to add mult_asic specific bgpd template generation unit test cases.
2021-03-02 14:42:22 -08:00
abdosi
fbc3386825 [multi-asic] BBR support on internal-peers for multi-asic platfroms. (#6848)
Enable BBR config allowas-in 1 for internal peers

Why I did:
To advertise BBR routes learnt via e-BGP peer in one asic/namespace to another iBGP asic/namespace via Route Reflector.
2021-03-02 13:44:17 -08:00
Qi Luo
c9febff961 [radv] Disable radv for specific deployment_id (#6830) 2021-02-22 18:52:40 -08:00
judyjoseph
86a13610cb [docker-fpm-frr]: TSA/B/C changes for multi-asic (#6510)
- Introduced TS common file in docker as well and moved common functions.
- TSA/B/C scripts run only in BGP instances for front end ASICs.
       In addition skip enforcing it on route maps used between internal BGP sessions.

admin@str--acs-1:~$ sudo /usr/bin/TSA
System Mode: Normal -> Maintenance

and in case of Multi-ASIC
admin@str--acs-1:~$ sudo /usr/bin/TSA
BGP0 : System Mode: Normal -> Maintenance
BGP1 : System Mode: Normal -> Maintenance
BGP2 : System Mode: Normal -> Maintenance
2021-02-18 18:04:24 -08:00
Petro Bratash
4031791b4e [lldp]: Add verification IPv4 address on LLDP conf Jinja2 Template (#5699)
Fix #5812

LLDP conf Jinja2 Template does not verify IPv4 address and can use IPv6 version. This issue does not effect control LLDP daemon. Issue can be reproduced via `test_snmp_lldp` test. LLDP conf Jinja2 Template selects first item from the list of mgmt interfaces.

TESTBED_1 LLDP conf

```
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern FC00:3::32
configure system hostname dut-1
```
TESTBED_2  LLDP conf

```
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern 10.22.24.61
configure system hostname dut-2
```
TESTBED_1  MGMT_INTERFACE

```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|10.22.24.53/23
MGMT_INTERFACE|eth0|FC00:3::32/64
```
TESTBED_2  MGMT_INTERFACE

```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|FC00:3::32/64
MGMT_INTERFACE|eth0|10.22.24.61/23

```

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
2021-02-11 15:34:06 -08:00
abdosi
95bcefa7c9
[201911] Fix PTF Docker Build Error (#6583)
We are hitting the issue as described pypa/pip#9520.
Fix to use get_pip.py from 2.7 repo.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-01-28 02:19:12 -08:00
arlakshm
3cd536bb45 [Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
This PR has the changes to support having different swss.rec and sairedis.rec for each asic.
The logrotate script is updated as well

- How I did it

Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747)
In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec

Update the logrotate script for multiasic platform .
2021-01-27 17:12:32 -08:00
Tamer Ahmed
c5bd46f857 [dhcp-relay]: Launch DHCP Relay On L3 Vlan (#6527)
Recent changes brought l2 vlan concept which do not have DHCP
clients behind them and so DHCP relay is not required. Also,
dhcpmon fails to launch on those vlans as their interfaces
lack IP addresses. This PR limit launch of both DHCP relay
and dhcpmon to L3 vlans only.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-25 12:38:16 -08:00
pavel-shirshov
beaaf3316d [docker-frr]: Use egrep with regexp to match correct TSA rules (#6403)
**- Why I did it**
Earlier today we found a bug in the SONiC TSA implementation.
TSC shows incorrect output (see below) in case we have a route-map which contains TSA route-map as a prefix.
```
admin@str-s6100-acs-1:~$ TSC
Traffic Shift Check:
System Mode: Not consistent
```
The reason is that TSC implementation has too loose regexps in TSA utilities, which match wrong route-map entries:
For example, current TSC matches following
```
route-map TO_BGP_PEER_V4 permit 200
route-map TO_BGP_PEER_V6 permit 200
```
But it should match only
```
route-map TO_BGP_PEER_V4 permit 20
route-map TO_BGP_PEER_V4 deny 30
route-map TO_BGP_PEER_V6 permit 20
route-map TO_BGP_PEER_V6 deny 30
```

**- How I did it**
I fixed it by using egrep with `^` and `$` regexp markers which match begin and end of the line.

**- How to verify it**
1. Add follwing entry to FRR config:
```
str-s6100-acs-1# 
str-s6100-acs-1# conf t
str-s6100-acs-1(config)# route-map TO_BGP_PEER_V4 permit 200 
str-s6100-acs-1(config-route-map)# end
```
2. Use the TSC command and check output. It should show normal.
```
admin@str-s6100-acs-1:~$ TSC
Traffic Shift Check:
System Mode: Normal```
2021-01-20 10:37:10 -08:00
pavel-shirshov
f4245fb18d [bgpcfgd]: Support default action for "Allow prefix" feature (#6370)
* Use 20 and 30 route-map entries instead of 2 and 3 for TSA

* Added support for dynamic "Allow list" default action.

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
2021-01-08 15:12:52 -08:00
abdosi
a3d093a82a Updated imfile configuration for supervisord logs (#6368)
Updated imfile configuration for supervisord logs for stretch and buster.
2021-01-06 18:48:24 -08:00
abdosi
6e48839cae Enable the notify mode of rsyslogd imfile module used for supervisord (#6298)
Enable the notify mode of rsyslogd imfile module used for supervisord logs in docker container
2020-12-31 17:04:00 -08:00
Stepan Blyshchak
d43e8e16a3
[fpm-frr] fix start.sh template paths (#6329)
There is no /usr/share/sonic/templates/supervisord/ folder
and no supervisord.conf.j2 template.

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2020-12-31 17:01:24 -08:00
Junchao-Mellanox
547ec0a905 Add a configuration to delay start xcvrd for fast-reboot (#5643) 2020-12-22 09:51:54 -08:00
Tamer Ahmed
afc952535e [mgmt-framework] Call sonic-cfggen Once (#4937)
Optimizing number of calls made to sonic-cfggen during service
start up as it adds to total system boot up time.

***-Test 1***
there is an average saving of 1 to 1.5 sec between old script and new script
```
root@str-s6000-acs-14:/# time /usr/bin/rest-server-old.sh
Generating temporary TLS server certificate ...
2020/07/09 19:03:33 wrote cert.pem
2020/07/09 19:03:33 wrote key.pem
REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
/usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem

real	0m8.790s
user	0m7.993s
sys	0m0.584s
root@str-s6000-acs-14:/# time /usr/bin/rest-server-new.sh
Generating temporary TLS server certificate ...
2020/07/09 19:03:45 wrote cert.pem
2020/07/09 19:03:45 wrote key.pem
REST_SERVER_ARGS = -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem
/usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem

real	0m6.940s
user	0m5.670s
sys	0m0.386s
```
***-Test 2***
Built an image with this change and rest server is running with params as described in test 1 above
```
admin@str-s6000-acs-14:~$ ps -ef | grep rest_server
root      3301  2866  2 02:09 pts/0    00:00:10 /usr/sbin/rest_server -ui /rest_ui -logtostderr -cert /tmp/cert.pem -key /tmp/key.pem

```

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
2b3e18c0cc [swss] Reduce Calls to SONiC Cfggen (#5177)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting swss service.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
fd3e0b4c58 [frr] Reduce Calls to SONiC Cfggen (#5176)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to two calls during startup when starting frr service.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
c5f53f50b2 [radv] Reduce Calls to SONiC Cfggen (#5178)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting radv service.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
687c971a52 [dhcp-relay] Reduce Calls to SONiC Cfggen (#5175)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when starting dhcp-relay
service.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
066a0b3b2b [snmp]: Reduce Calls to SONiC Cfggen (#5166)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to once calle during snmp startup

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
Tamer Ahmed
fae4c4bfcc [swss] Enhance ARP Update to Call Sonic Cfggen Once (#5398)
This PR limited the number of calls to sonic-cfggen to one call
per iteration instead of current 3 calls per iteration.

The PR also installs jq on host for future scripts if needed.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-12-22 09:51:54 -08:00
pavel-shirshov
0931280466 [TSA]: Fix TSC. Avoid 'Not consistent' state (#5968) 2020-12-10 16:43:37 -08:00
pavel-shirshov
9e0ea83cd9
[bgpcfgd]: Use peer commands for BBR, not peer-group (#6048)
* templates: Move 'allowas-in' command from peer-group to instance configuration

* Use peer itself, don't rely on peer-groups
2020-11-26 09:55:24 -08:00
Lawrence Lee
cb32b362f5 Make backend device checking more robust (#5730)
Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates
 Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created
Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2020-11-14 08:39:08 -08:00
pavel-shirshov
e9ff96d90e [bgp]: Update TSA functionality (#5906)
Fixed TSA bugs:
1. TSA didn't advertise Loopback ipv6 address
2. TSA and TSB changed BGP dynamic and BGP monitors sessions

**- How to verify it**
Build an image and run on your DUT.
```
admin@str-s6100-acs-1:~$ TSA
System Mode: Normal -> Maintenance
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.0.32/32     0.0.0.0                  0         32768 i

Total number of prefixes 1
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> fc00:1::/64      ::                       0         32768 i

Total number of prefixes 1
admin@str-s6100-acs-1:~$ TSB
System Mode: Maintenance -> Normal
```

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
2020-11-14 08:35:13 -08:00
judyjoseph
005702ba0e [multi-ASIC] util changes with the BGP_INTERNAL_NEIGHBOR table. (#5760)
- Why I did it
Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table.
Additionally to address the review comment #5520 (comment)
Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up.
Updates to the internal tests data + add all of it to template tests.

- How I did it
Updated the APIs and the template files.

- How to verify it
Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()
2020-11-10 12:53:49 -08:00
judyjoseph
ce86621399 [multi-ASIC] BGP internal neighbor table support (#5520)
* Initial commit for BGP internal neighbor table support.
  > Add new template named "internal" for the internal BGP sessions
  > Add a new table in database "BGP_INTERNAL_NEIGHBOR"
  > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"

* Changes in template generation tests with the introduction of internal neighbor template files.
2020-11-10 12:52:58 -08:00
abdosi
65cc37cadf [multi-asic] teamdctl support for multi-asic (#5851)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-11-09 12:33:41 -08:00
Junchao-Mellanox
1070d024bc [thermalctld] Enlarge startretries value to avoid thermalctld not able to restart during regression test (#5633)
Increase startretires value from default of 10 to 50 to prevent supervisor from placing thermalctld in FATAL state during regression testing. Also ensures supervisord tries hard to get thermalctld running in production, as thermalctld is critical to prevent device from overheating.
2020-11-03 08:19:19 -08:00
abdosi
0fad6bdc7f [monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-11-01 10:27:10 -08:00
shlomibitton
97f2cafe0b [LLDP] Fix for LLDP advertisements being sent with wrong information. (#5493)
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>

* Fix comments

* Fix unit-test output caused a failure during build

* Add 'run_cmd' function and use it

* Resume lldpd even if port init timeout reached
2020-10-30 09:06:23 -07:00
pavel-shirshov
2eec3b3254 [bgpcfgd]: Dynamic BBR support (#5626)
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding  `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry 

**- How I did it**
I introduced a new CONFIG_DB entry:
 - table name: "BGP_BBR"
 - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
 - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```

Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$
```

Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```


Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%
```

**- Which release branch to backport (provide reason below if selected)**

- [ ] 201811
- [x] 201911
- [x] 202006
2020-10-30 08:58:27 -07:00
pavel-shirshov
84405ab953 [bgp]: Enable next-hop-tracking through default (#5600)
**- Why I did it**
FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality.
That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections.  Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default`

**- How I did it**
Put `ip nht resolve-via-default` into the config

**- How to verify it**
Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-13 22:42:29 -07:00
abdosi
9202b1c7eb
Fix monit complaining of snmp on 201911 branch. (#5612)
There is difference between master and 201911
how sonic_ax_impl is started.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-13 17:17:43 -07:00
Mahesh Maddikayala
f354a20d94 [ECMP][Multi-ASIC] Have different ECMP seed value on each ASIC (#5357)
* Calculate ECMP hash seed based on ASIC ID on multi ASIC platform. Each ASIC will have a unique ECMP hash seed value.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-13 09:48:57 -07:00
pavel-shirshov
437ad95646 [bgp] Add 'allow list' manager feature (#5513)
implements a new feature: "BGP Allow list."

This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
2020-10-06 11:15:19 -07:00
abdosi
3a29249e04 [Multi-asic] Fixed Default Route to be BGP (#5548)
Learned and not docker default route for multi-asic platforms.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-06 06:04:31 +00:00
Nazarii Hnydyn
f456f1fd03 [monit]: Fix process checker. (#5480)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-09-30 00:25:37 +00:00