7f50b9815e14d90c02d9dce63fd08d90e25cee3f (HEAD -> 201911, origin/201911) handled update() function of fdb orchagent for FDB FLUSH event (#1534)
17adc13b6ca21846fe27c94d6a16f9909c712d77 Add a check for warm-restart, and do a clear only when warm-restart is enable. (#1498)
d097260a5aa7bd611babd5062e220056374e23d8 Fixed compilation failure with debug option (#1518)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
The issue was a typo introduced in #6006. In that change, the BGP allow list
configuration manager was updated to use a method of common ConfigMgr
for restarting peer groups. However, the method name 'restart_peers' was
used instead of the correct 'restart_peer_groups'.
This change updated the managers_allow_list.py to use correct method
'restart_peer_groups' for restarting peer groups.
Signed-off-by: Xin Wang <xiwang5@microsoft.com>
- Why I did it
Fix issue #6043
- How I did it
We are disabling in container frr log. The log entries are sent to base image and are logged in /var/log/quagga/bgpd.log.
However, we need to remove the whole outchannel config block to avoid an error message raised by rsyslogd.
- How to verify it
Without the change, test_autorestart bgp container will fail on loganalyer errors. With the change, restarting bgp container is no longer generating error message and the test will pass.
The log generated by frr continued appearing in /var/log/quagga/bgpd.log
* [bgpcfgd]: Batch bgp updates.
vtysh -f command is slow. It is sometimes takes about 3 seconds.
When we need to run many vtysh -f commands that slows down the system.
Batch vtysh -f updates.
* Use correct file to import run_command
- Why I did it
frr is creating /var/log/frr/frr.log inside the frr docker and letting it grow. It will eventually exhaust hard drive space.
To fixe issue #5965
- How I did it
Remove rsyslog file outchannel so that frr won't generate /var/log/frr/frr.log inside the docker.
- How to verify it
Manually removed the outchannel and restart BGP docker, making sure that /var/log/frr/frr.log is no longer created inside the docker.
While restarting bgp docker, observed that base image /var/log/quagga/bgpd.log continued to grow and captured all FRR logs.
When forced mgmt routes are present, the issue fixed as part of #5754 is not complete.
Added a preference(priority) field to forced mgmt route ip rules
756dd9c8123cd06dc581d9b2eb236334deee1850 (HEAD -> 201911, origin/201911)
[201911 sonic-swss] Flushing FDB entries before removing BridgePort (#1516)
e3f22ea6685104a819440ecc0efe89c4bd3a0003 [201911/portsorch] Add
correct stat list for port buffer drop counters (#1509)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Add explicit default state into the constants.yml
* Enable/disable only peer-groups, available in the config
* Retrieve updates from frr before using configuration
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates
Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created
Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Fixed TSA bugs:
1. TSA didn't advertise Loopback ipv6 address
2. TSA and TSB changed BGP dynamic and BGP monitors sessions
**- How to verify it**
Build an image and run on your DUT.
```
admin@str-s6100-acs-1:~$ TSA
System Mode: Normal -> Maintenance
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.1.0.32/32 0.0.0.0 0 32768 i
Total number of prefixes 1
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> fc00:1::/64 :: 0 32768 i
Total number of prefixes 1
admin@str-s6100-acs-1:~$ TSB
System Mode: Maintenance -> Normal
```
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
- Why I did it
Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table.
Additionally to address the review comment #5520 (comment)
Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up.
Updates to the internal tests data + add all of it to template tests.
- How I did it
Updated the APIs and the template files.
- How to verify it
Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()
* Initial commit for BGP internal neighbor table support.
> Add new template named "internal" for the internal BGP sessions
> Add a new table in database "BGP_INTERNAL_NEIGHBOR"
> The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"
* Changes in template generation tests with the introduction of internal neighbor template files.
FixAzure/SONiC#551
When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template.
This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.
Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".
This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.
This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.
Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
Added new MultiASIC util method "get_back_end_interface_set()" to speed up back-end interface check by allowing caller to cache the back-end intf into a set. This way the caller can use this set for all subsequent back-end interface check requests instead of each time need to read from redis DB which become a scaling issue for cases such as checking for thousands of nexthop routes for filtering purpose.
Why/How I did:
Make sure first error syslog is triggered based on FAULT TOLERANCE condition.
Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent
Updated the monit conf files with repeat every x cycles for the alert action
FixAzure/SONiC#551
When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template.
This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.
Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".
This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.
This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.
Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Fix comments
* Fix unit-test output caused a failure during build
* Add 'run_cmd' function and use it
* Resume lldpd even if port init timeout reached
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry
**- How I did it**
I introduced a new CONFIG_DB entry:
- table name: "BGP_BBR"
- key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
- data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"
Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).
bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).
**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
neighbor PEER_V4 allowas-in 1
neighbor PEER_V6 allowas-in 1
```
Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json
{
"BGP_BBR": {
"all": {
"status": "disabled"
}
}
}
admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```
Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
admin@str-s6100-acs-1:~$
```
Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json
{
"BGP_BBR": {
"all": {
"status": "enabled"
}
}
}
admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```
Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
neighbor PEER_V4 allowas-in 1
neighbor PEER_V6 allowas-in 1
```
*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name Stmts Miss Cover
----------------------------------------------------
bgpcfgd/__init__.py 0 0 100%
bgpcfgd/__main__.py 3 3 0%
bgpcfgd/config.py 78 41 47%
bgpcfgd/directory.py 63 34 46%
bgpcfgd/log.py 15 3 80%
bgpcfgd/main.py 51 51 0%
bgpcfgd/manager.py 41 23 44%
bgpcfgd/managers_allow_list.py 385 21 95%
bgpcfgd/managers_bbr.py 76 0 100%
bgpcfgd/managers_bgp.py 193 193 0%
bgpcfgd/managers_db.py 9 9 0%
bgpcfgd/managers_intf.py 33 33 0%
bgpcfgd/managers_setsrc.py 45 45 0%
bgpcfgd/runner.py 39 39 0%
bgpcfgd/template.py 64 11 83%
bgpcfgd/utils.py 32 24 25%
bgpcfgd/vars.py 1 0 100%
----------------------------------------------------
TOTAL 1128 530 53%
```
**- Which release branch to backport (provide reason below if selected)**
- [ ] 201811
- [x] 201911
- [x] 202006
**- Why I did it**
I was asked to change "Allow list" prefix-list generation rule.
Previously we generated the rules using following method:
```
For each {prefix}/{masklen} we would generate the prefix-rule
permit {prefix}/{masklen} ge {masklen}+1
Example:
Prefix 1.2.3.4/24 would have following prefix-list entry generated
permit 1.2.3.4/24 ge 23
```
But we discovered the old rule doesn't work for all cases we have.
So we introduced the new rule:
```
For ipv4 entry,
For mask < 32 , we will add ‘le 32’ to cover all prefix masks to be sent by T0
For mask =32 , we will not add any ‘le mask’
For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0
For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0
For mask = 128 , we will not add any ‘le mask’
```
**- How I did it**
I change prefix-list entry generation function. Also I introduced a test for the changed function.
**- How to verify it**
1. Build an image and put it on your dut.
2. Create a file test_schema.conf with the test configuration
```
{
"BGP_ALLOWED_PREFIXES": {
"DEPLOYMENT_ID|0|1010:1010": {
"prefixes_v4": [
"10.20.0.0/16",
"10.50.1.0/29"
],
"prefixes_v6": [
"fc01:10::/64",
"fc02:20::/64"
]
},
"DEPLOYMENT_ID|0": {
"prefixes_v4": [
"10.20.0.0/16",
"10.50.1.0/29"
],
"prefixes_v6": [
"fc01:10::/64",
"fc02:20::/64"
]
}
}
}
```
3. Apply the configuration by command
```
sonic-cfggen -j test_schema.conf --write-to-db
```
4. Check that your bgp configuration has following prefix-list entries:
```
admin@str-s6100-acs-1:~$ show runningconfiguration bgp | grep PL_ALLOW
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128
```
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.
**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service
**- How to verify it**
Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs
1. Stop teamd docker alone
> swss, syncd dockers seen going away
> The LAG reference count error messages seen for a while till swss docker stops.
> Dockers back up.
2. Enable WR mode for teamd. Stop teamd docker alone
> swss, syncd dockers not removed.
> The LAG reference count error messages not seen
> Repeated stop teamd docker test - same result, no effect on swss/syncd.
3. Stop swss docker.
> swss, teamd, syncd goes off - dockers comes back correctly, interfaces up
4. Enable WR mode for swss . Stop swss docker
> swss goes off not affecting syncd/teamd dockers.
5. Config reload
> no reference counter error seen, dockers comes back correctly, with interfaces up
6. Warm reboot, observations below
> swss docker goes off first
> teamd + syncd goes off to the end of WR process.
> dockers comes back up fine.
> ping traffic between VM's was NOT HIT
7. Fast reboot, observations below
> teamd goes off first ( **confirmed swss don't exit here** )
> swss goes off next
> syncd goes away at the end of the FR process
> dockers comes back up fine.
> there is a traffic HIT as per fast-reboot
8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
Issue was because we were relying on port_alias_asic_map dictionary
but that dictionary can't be used as alias name format has changed.
Fix the port alias mapping as what is needed.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
With python 2.7, import yaml module was resulting in huge memory allocation in the heap per process. As an interim fix, moving the import yaml to the function which actually uses this module. This helps reduce the memory footprint of pmon docker, as it don't use the API's which need yaml processing.
This issue not seen with importing yaml with python3, Need to be further analyzed, hence putting this fix in 201911 where we continue to use python2.7.
**- Why I did it**
FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality.
That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default`
**- How I did it**
Put `ip nht resolve-via-default` into the config
**- How to verify it**
Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Calculate ECMP hash seed based on ASIC ID on multi ASIC platform. Each ASIC will have a unique ECMP hash seed value.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
**- Why I did it**
BGP_MONITORS sessions don't have corresponding DEVICE_NEIGHBOR_METADATA CONFIG_DB entries in the minigraphs. Prevent bgpcfgd to wait on such entries for BGP_MONITORS sessions.
**- How I did it**
Set constructor argument to False that means - don't wait for device neighbors metadata info for BGP_MONITORS
**- How to verify it**
Build an image, write on your device, use a minigraph with BGP_MONITORS sessions. Check that sessions are populated in the config.
implements a new feature: "BGP Allow list."
This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
* Fix generate_l2_config: don't override hostname because sonic-cfggen may not read
from Redis. Fix test_l2switch_template test case to test preset l2
feature
* Improve test script: compare json files with sort_keys
* Revert changes on sample_output
* Remove members field in VLAN section. Fix test assertTrue statement.
Fix load minigraph on 201911 branch. (#1124)
Fixed config load_minigrpah not working for Multi-asic platfroms.
(#1123)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
[Namespace]: Fix SAI_ID key used in cpfcIfTable and
csqIfQosGroupStatsTable implementation (#138)
Implementation changes for CiscoBgp4MIB (#158)
[ciscoSwitchQosMIB]: Remove invocation of update_data function
during (#161)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
implements a new feature: "BGP Allow list."
This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
Revert "Revert " [201911]show interface counters for multi ASIC devices
(#1104)""
Revert "Revert "Pfcstat (#1097)""
[show] Fix 'show int neighbor expected' (#1106)
Update argument for docker exec it->i (#1118)
Update to make config load/reload backward compatible. (#1115)
Handling deletion of Port Channel before deletion of its members
(#1062)
Skip default route present in ASIC-DB but not in APP-DB. (#1107)
[CLI][PFCWD][Multi-ASIC] Added multi ASIC support to 'pfcwd' CLI
(#1102)
[201911] Multi asic platform config interface portchannel, show
transceiver (#1087)
[drop counters] Fix configuration for counters with lowercase
names (#1103)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
[xcvrd] Don't log unnecessary messages upon empty transceiver change
event (#53)
[thermalctld] Optimize the thermal policy loop to make it execute
every 60 seconds (#77)
[thermalctld] Fix issue: fan status should not be True when fan is
absent (#92)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Avoid adding loopback interface (ip link add) when setting nat zone on
loopback interface (#1434)
[acl] Remove Ethertype from L3V6 qualifiers (#1433)
Sflow fixes during DEL processing (#1427)
Fix#3971 by skipping create-only SAI attributes when modifying
buffer pools or profiles in orchagent (#1430)
Fix issue: bufferorch only pass the first attribute to sai when
setting attribute (#1442)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Fix SubscriberStateTable::hasCachedData formula for a timing risk
(#379)
Add restapi DB (#386)
Fix swss::exec return value (#368)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Printing both snapshot and current counter sets will make it easier to pinpoint
which message type(s) is/are not being relayed. This PR prints both counter sets.
Also, this PR defines gnu11 as a C standard to compile with in order to avoid
making changes when porting to 201811 branch.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
update the environment variable in the teardown (#1101)
Fix for show interface portchannel now working on 201911 (#1105)
Revert "Pfcstat (#1097)"
Revert " [201911]show interface counters for multi ASIC devices
(#1104)"
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
update the environment variable in the teardown (#1101)
Fix for show interface portchannel now working on 201911 (#1105)
[201911]show interface counters for multi ASIC devices (#1104)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Parse quagga output without knowledge about hostname, so robust
against hostname changes or mismatch (#124)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Add ip_prefix len based on proxy_arp status (#1096)
[sonic-cfggen][QoS][multi ASIC] Multi ASIC QoS and Buffer config
generation support, merge from master (#1095)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Update transceiver info DB key names (#146)
[Multiasic]: Provide namespace support for ipNetToMediaPhysAddress (#129)
[LLDP]: Modify OID index of LLDPRemTableUpdater MIB (#155)
[Namespace]: Simplify sync_d functions to use higher order (#154)
[Namespace]: Fix interface counters in RFC 1213 (#145)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Add sonic_interface.py in sonic-py-common for sonic interface utilities to keep this SONIC PREFIX naming convention in one place in py-common and all modules/applications use the functions defined here.
[201911] support show ip(v6) bgp summary for multi asic platform (#1083)
Fix Bad-Cherry pick and compilation error.
Signed-off-by: Ubuntu <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
Fix error when running 'show lldp table' or 'show lldp neighbor'
through SSH command. (#1067)
[201911]: Multi asic show interface support (#1070)
[counterpoll] Add new FC group for port buffer drop counters
(#1024)
[201911] show interface portchannel support for Multi ASIC
(#1071)
Fix a typo in mellanox_buffer_migrator (#1090)
Fix pfcwd stats crash with invalid queue name (#1077)
[PFCWD] Fix issue with "pfcwd show stats" command during SONiC init
(#1018)
enable watchdog before running platform specific reboot plugin
(#1037)
Add namespace of the process in the coredump filename. (#1091)
[portsorch] add buffer drop FC group
[bitmap_vnet] Fix VNET route priority issue (#1421)
[vnet] Maintain the reference count of the nexthop when creating a
vn… (#1414)
[intfsorch] Retrieve Port object before setting NAT zone on router
interfaces. (#1372)
Update the meta code to support DNAT Pool changes (#616)
[syncd] Fix notification on shutdown request (#637)
Advance the submodule head of SAI (#641)
Add new line in sai_meta_log_syncd fprintf call (#649)
Fix Warmboot Issue when upgraded Image SAI return Switch Internal
OID not accounted in previous image. (#654)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
The following changes are done.
- Multi asic platform have 2 Loopback interfaces, Loopback0 and Loopback4096. IPinIP decap entries need to be added for both of them. Update the ipinip.json.j2 template to add decap entries for Loopback4096.
- Add corressponding unit test
sonic_device_util.py has been deprecated in favor of the sonic-py-common package. All related functionality has been moved there.
This is a backport of https://github.com/Azure/sonic-buildimage/pull/5130 to the 201911 branch
Add watchdogutil to control the hw watchdog (#945)
[db_migrator] Support migrating database regarding buffer
configuration for all Mellanox switches (#1053)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* [201911] skip_thermalctld for VS platform as it is
not supported
root@vlab-01:/# supervisorctl status
dependent-startup EXITED Aug 18 06:22 AM
rsyslogd RUNNING pid 18, uptime 0:12:26
start EXITED Aug 18 06:22 AM
supervisor-proc-exit-listener RUNNING pid 13, uptime 0:12:27
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* [201911] after PR #https://github.com/Azure/sonic-buildimage/pull/5204
this function is no more needed so cleaning up.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
NUM_ASIC environment variable was added so that it could be used by other utilities.
This is not being used by any other utility or docker, hence removing the addition of NUM_ASIC environment variable.
Also, the environment variable was added by adding the variable value to /etc/environ file.
Upon each reboot, this file gets updated with the NUM_ASIC value but the existing value was not removed.
This causes multiple lines getting appended in /etc/environ file upon each reboot.
Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>
[sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated
in state DB in ERSPAN Mirror (#1375)
[vlanmgr] Support Jumbo Frame By Default (#1393)
[fec] added logic that put port down before applying fec
onfiguration (#1399)
[fec] Get FEC mode when port is already admin down (#1403)
Refine getDbId() calling to fix build after swss-common change
(#1245)
Update the name from get_npu_device_id() -to get_asic_device_id() and move from device_info.py, to the new file multi_asic.py in sonic-py-common so that it references to the correct valid constants.
a) we should use get_platform() with new sonic_py-common package
b) In 201911 DB Connector is still using db id based constructor
as following PR https://github.com/Azure/sonic-buildimage/pull/4549
is not cherry-picked yet. So revert the change to use db is insatead of
db_name for now.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Bring up FPGA ports and test it
* Bring up those ports in neighbors dict
* Revert delete of a line
* Add test
* change code comment
* Change test name
* Revert submodule update
Ignore IPv6 link-local and multicast entries as Vnet routes (#1401)
[201911]Handled both REDIRECT and REDIRECT_ACTION ACL rules in ACL (#1397)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added to the 201911 branch in https://github.com/Azure/sonic-buildimage/pull/5063
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module
This is a step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
As part of this commit and previous commit ff6cb6c402
sonic-utilities submodule for 201911 has been updated to take following
changes:
Add support for QSFP-DD cables on 'show' command (#989)
[show] Fix for 'trunk' PortChannel reported as 'routed' port (#1002)
Enable HW watchdog before fast-reboot (#977)
[filter-fdb] Check VLAN Presence When Filter FDB (#957) (#975)
[filter-fdb] Fix For Vlan Defined With No CIDR (#976)
[show/config]: combine feature and container feature cli (#1015)
Applications running in the host OS can read the platform identifier from /host/machine.conf. When loading configuration, sonic-config-engine *needs* to read the platform identifier from machine.conf, as it it responsible for populating the value in Config DB.
When an application is running inside a Docker container, the machine.conf file is not accessible, as the /host directory is not mounted. So we need to retrieve the platform identifier from Config DB if get_platform() is called from inside a Docker
container. However, we can't simply check that we're running in a Docker container because the host OS of the SONiC virtual switch is running inside a Docker container. So I refactored `get_platform()` to:
1. Read from the `PLATFORM` environment variable if it exists (which is defined in a virtual switch Docker container)
2. Read from machine.conf if possible (works in the host OS of a standard SONiC image, critical for sonic-config-engine at boot)
3. Read the value from Config DB (needed for Docker containers running in SONiC, as machine.conf is not accessible to them)
- Also fix typo in daemon_base.py
- Also changes to align `get_hwsku()` with `get_platform()`
To clarify error messages in case the ip address for Loopback is already set. It doesn't make sense to call correct ip address as ambiguous in this case
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.
The package currently includes four modules:
- daemon_base
- device_info
- logger
- task_base
NOTE: This is a combination of all changes from https://github.com/Azure/sonic-buildimage/pull/5003, https://github.com/Azure/sonic-buildimage/pull/5049 and some changes from https://github.com/Azure/sonic-buildimage/pull/5043 backported to align with the 201911 branch. As part of the 201911 port, I am not installing the Python 3 package in the base image or in the VS container, because we do not have pip3 installed, and we do not intend to migrate to Python 3 in 201911.
Revert "Refine getDbId() calling to fix build after swss-common change (#1245)"
This shoudl fix VS build.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
This is fix for compilation error also on 201911.
[schema]: Add a new table "NAT_DNAT_POOL_TABLE" to hold the DNAT Pool
entries.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Update the meta code to support DNAT Pool changes (#616)
[syncd] Fix notification on shutdown request (#637)
Advance the submodule head of SAI (#641)
Signed-off-by: Stephen Sun <stephens@mellanox.com>
[201911] Update nat entries to use nat_type to support DNAT Pool
changes. (#1297)
[201911] Update nat entries to use nat_type to support DNAT Pool
changes. (#1297)
* [daemon_base] fix to not reregister signal handler
-src/sonic-daemon-base/sonic_daemon_base/daemon_base.py
Problem:
Currently all daemons inherit from daemon_base class, and for
signal handling functionality they register the signal_handler() by
overriding the siganl_handler() in daemon_base by their own
implmentation.
But some sonic_platform instances also can invoke the daemon_base
constructor while trying to instantiate the common utilities
for example
platform_chassis = sonic_platform.platform.Platform().get_chassis()
This will cause the re registration of signal_handler which will
cause base class signal_handler() to be invoked when the daemon
gets a signal, whereas their own signal_handler should have been
invoked.
Fix:
We only register the siganl_handler once, and if signal_handler has
been registered, not re register it.
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
* [daemon_base] fix to not reregister signal handler
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>