Commit Graph

290 Commits

Author SHA1 Message Date
lguohan
339d2aa6c8 [mgmt ip]: mvrf ip rule priority change to 32765 (#5754)
Fix Azure/SONiC#551

When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template.

This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.

Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".

This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.

This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.

Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
2020-11-01 10:41:44 -08:00
Abhishek Dosi
65cb10714c Revert "[mgmt ip]: mvrf ip rule priority change to 32765 (#5754)"
This reverts commit 28366cd0ce.
2020-11-01 10:37:16 -08:00
Renuka Manavalan
ecd10b9d10 Load config after subscribe (#5740)
- Why I did it
The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order.

- How I did it
Do a load after after updating all feature states.

- How to verify it
Not a easy one
Have a script that
restart hostcfgd
sleep 2s
run redis-cli/config command to update AAA/TACACS table

Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute.

- When it repro:
The updates will not reflect in /etc/pam.d/common-auth-sonic
2020-11-01 10:27:10 -08:00
abdosi
0fad6bdc7f [monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-11-01 10:27:10 -08:00
lguohan
28366cd0ce [mgmt ip]: mvrf ip rule priority change to 32765 (#5754)
Fix Azure/SONiC#551

When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template.

This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.

Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".

This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.

This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.

Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
2020-11-01 10:27:10 -08:00
pavel-shirshov
2eec3b3254 [bgpcfgd]: Dynamic BBR support (#5626)
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding  `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry 

**- How I did it**
I introduced a new CONFIG_DB entry:
 - table name: "BGP_BBR"
 - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
 - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```

Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$
```

Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```


Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%
```

**- Which release branch to backport (provide reason below if selected)**

- [ ] 201811
- [x] 201911
- [x] 202006
2020-10-30 08:58:27 -07:00
pavel-shirshov
bee6c87f90 [bgpcfgd]: Change prefix-list generation for "Allow prefix" feature (#5639)
**- Why I did it**
I was asked to change "Allow list" prefix-list generation rule.
Previously we generated the rules using following method:
``` 
For each {prefix}/{masklen} we would generate the prefix-rule
permit {prefix}/{masklen} ge {masklen}+1
Example:
Prefix 1.2.3.4/24 would have following prefix-list entry generated
permit 1.2.3.4/24 ge 23
```
But we discovered the old rule doesn't work for all cases we have.

So we introduced the new rule:
```
For ipv4 entry,  
For mask  < 32 , we will add ‘le 32’ to cover all  prefix masks to be sent by T0  
For mask =32 , we will not add any ‘le mask’ 
For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0  
For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0 
For mask = 128 , we will not add any ‘le mask’ 
```    

**- How I did it**
I change prefix-list entry generation function. Also I introduced a test for the changed function.

**- How to verify it**
1. Build an image and put it on your dut.

2. Create a file test_schema.conf with the test configuration
```
{
    "BGP_ALLOWED_PREFIXES": {
        "DEPLOYMENT_ID|0|1010:1010": {
            "prefixes_v4": [
                "10.20.0.0/16",
                "10.50.1.0/29"
            ],
            "prefixes_v6": [
                "fc01:10::/64",
                "fc02:20::/64"
            ]
        },
        "DEPLOYMENT_ID|0": {
            "prefixes_v4": [
                "10.20.0.0/16",
                "10.50.1.0/29"
            ],
            "prefixes_v6": [
                "fc01:10::/64",
                "fc02:20::/64"
            ]
        }
    }
}
```

3. Apply the configuration by command 
```
sonic-cfggen -j test_schema.conf --write-to-db
```

4. Check that your bgp configuration has following prefix-list entries:
```
admin@str-s6100-acs-1:~$ show runningconfiguration bgp | grep PL_ALLOW
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128

``` 

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
2020-10-30 08:56:52 -07:00
bingwang-ms
7a015eacc1 Fix 'NoSuchProcess' exception in process_checker (#5716)
The psutil library used in process_checker create a cache for each
process when calling process_iter. So, there is some possibility that
one process exists when calling process_iter, but not exists when
calling cmdline, which will raise a NoSuchProcess exception. This commit
fix the issue.

Signed-off-by: bingwang <bingwang@microsoft.com>
2020-10-30 08:56:10 -07:00
judyjoseph
5a802533b5
Fix to remove the import of APIClient (#5724) 2020-10-27 08:32:37 -07:00
judyjoseph
963bd7fdc4 [docker-teamd]: Add teamd as a depedent service to swss (#5628)
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service

**- How to verify it**

Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2,  ping traffic continuous between VMs

1. Stop teamd docker alone
      >  swss, syncd dockers seen going away
      >  The LAG reference count error messages seen for a while till swss docker stops.
      >  Dockers back up.

2. Enable WR mode for teamd. Stop teamd docker alone
      >  swss, syncd dockers not removed.
      >  The LAG reference count error messages not seen
      >  Repeated stop teamd docker test - same result, no effect on swss/syncd.

3. Stop swss docker.
      >  swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

4. Enable WR mode for swss . Stop swss docker
      >  swss goes off not affecting syncd/teamd dockers.

5. Config reload
      > no reference counter error seen, dockers comes back correctly, with interfaces up

6. Warm reboot, observations below
	 > swss docker goes off first
	 > teamd + syncd goes off to the end of WR process.
 	 > dockers comes back up fine.
	 > ping traffic between VM's was NOT HIT

7. Fast reboot, observations below
	 > teamd goes off first ( **confirmed swss don't exit here** )
	 > swss goes off next
	 > syncd goes away at the end of the FR process
	 > dockers comes back up fine.
	 > there is a traffic HIT as per fast-reboot

8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
2020-10-23 15:49:23 -07:00
yozhao101
d8ae2a0019 [hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689)
**- Why I did it**
If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service.
The snmp.timer service will first stop snmp.service and later start snmp.service. 

In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was
updated.

**- How I did it**
When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be
cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a
container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached
value will also be updated. Otherwise, nothing will be done.

**- How to verify it**
We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands  `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-10-23 15:45:04 -07:00
Joe LeVeque
4dde7d00cf [caclmgrd] Prevent unnecessary iptables updates (#5312)
When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied.

This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.
2020-10-21 12:15:04 -07:00
abdosi
c9e0b06009 Optimze ACL Table/Rule notification handling (#5621)
* Optimze ACL Table/Rule notifcation handling
to loop pop() until empty to consume all the data in a batch

This wau we prevent multiple call to iptable updates

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address review comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-14 08:08:23 -07:00
abdosi
ccebd006b5 Optimized caclmgrd Notification handling. Previously (#5560)
any event happening on ACL Rule Table (eg DATAACL rules
programmed) caused control plane default action to be triggered.

Now Control Plance ACTION will be trigger only

a) ACL Rule beloging to Control ACL Table

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-08 11:57:04 -07:00
pavel-shirshov
437ad95646 [bgp] Add 'allow list' manager feature (#5513)
implements a new feature: "BGP Allow list."

This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
2020-10-06 11:15:19 -07:00
Ying Xie
bea968bb2b [rc.local] separate configuration migration and grub installation logic (#5528)
To address issue #5525

Explicitly control the grub installation requirement when it is needed.
We have scenario where configuration migration happened but grub
installation is not required.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-10-04 19:41:50 +00:00
Abhishek Dosi
04725bc030 Revert "[bgp] Add 'allow list' manager feature (#5309)"
This reverts commit b5d33b39de.
2020-09-29 15:39:04 +00:00
Tamer Ahmed
2cc98b4bac [platform] Add Support For Environment Variable File (#5010)
* [platform] Add Support For Environment Variable

This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-28 21:14:39 +00:00
pavel-shirshov
b5d33b39de [bgp] Add 'allow list' manager feature (#5309)
implements a new feature: "BGP Allow list."

This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
2020-09-28 16:20:27 +00:00
bingwang-ms
0fabc906d1 Fix exception when attempting to write a datetime to db (#5467)
redis-py 3.0 used in master branch only accepts user data as bytes,
strings or numbers (ints, longs and floats). Attempting to specify a key
or a value as any other type will raise a DataError exception.
This PR address the issue bt converting datetime to str
2020-09-28 16:18:24 +00:00
judyjoseph
cff716f7a5 [Multi-Asic] Forward SNMP requests received on front panel interface to SNMP agent in host. (#5420)
* [Multi-Asic] Forward SNMP requests destined to loopback IP, and coming in through the front panel interface
             present in the network namespace, to SNMP agent running in the linux host.

* Updates based on comments

* Further updates in docker_image_ctl.j2 and caclmgrd

* Change the variable for net config file.

* Updated the comments in the code.

* No need to clean up the exising NAT rules if present, which could be created by some other process.

* Delete our rule first and add it back, to take care of caclmgrd restart.
Another benefit is that we delete only our rules, rather than earlier approach of "iptables -F" which cleans up all rules.

* Keeping the original logic to clean the NAT entries, to revist when NAT feature added in namespace.

* Missing updates to log_info call.
2020-09-28 16:14:07 +00:00
yozhao101
7580c846ad
[201911][Monit] Unmonitor processes in disabled containers (#5462)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

- Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:30:41 -07:00
abdosi
73bd647e44 Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358)
* Enhanced Feature Table state enable/disbale for multi-asic platforms.
In Multi-asic for some features we can service per asic so we need to
get list of all services.

Also updated logic to return if any one of systemctl command return failure
and make sure syslog of feature getting enable/disable only come when
all commads are sucessful.

Moved the service list get api from sonic-util to sonic-py-common

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Make sure to retun None for both service list in case of error.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Return empty list as fail condition

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Address Review Comments.

Made init_cfg.json.j2 knowledegable of Feature
service is global scope or per asic scope

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Fix merge conflict

* Address Review Comment.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-22 11:38:19 -07:00
Renuka Manavalan
7d6e5083ce [monit] Periodically monitor route consistency (#5085)
* Add route_check to mont.

* Switched to units of cycles per comments

* Added comments per Joe's comments.

* Added more comments per Royal's comments.
2020-09-19 15:47:53 -07:00
Blueve
64e04f8542 [conf] append nos-config-part for s6100 (#5234)
* [conf] append nos-config-part for s6100

* modify rc.local

Signed-off-by: Guohan Lu <lguohan@gmail.com>

* Update rc.local

Co-authored-by: Blueve <jika@microsoft.com>
Co-authored-by: Guohan Lu <lguohan@gmail.com>
Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
2020-09-19 14:14:32 -07:00
noaOrMlnx
d4f6e080cb Change update_feature_state call to pass False as default if feature has no 'has_timer' field (#5260)
* Pass False as default if feature has no timer field

* Update hostcfgd to fit the new changes merged

New changes can be found in PR:5248
2020-09-19 14:07:53 -07:00
abdosi
e43521ab64 [Multi-Asic] Fix for multi-asic where we should allow docker local (#5364)
communication on docker eth0 ip . Without this TCP Connection to Redis
does not happen in namespace.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-19 14:04:56 -07:00
Joe LeVeque
05e5807b3f [process-reboot-cause] Use Logger class from sonic-py-common package (#5384)
Eliminate duplicate logging code by importing Logger class from sonic-py-common package.
2020-09-19 13:59:59 -07:00
Joe LeVeque
930526f6f8 [procdockerstatsd] Inherit DaemonBase class from sonic-py-common package (#5372)
Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.
2020-09-19 13:55:06 -07:00
Joe LeVeque
a957ac6402 [caclmgrd] Inherit DaemonBase class from sonic-py-common package (#5373)
Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.
2020-09-19 13:54:25 -07:00
Abhishek Dosi
13d44a4faf Removed DB specific get api's from Selectable class (PR #378)
on sonic-swss-common

With the change as part of #378 caclmgrd need to be updated
to use new client side Get API to access namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-09-03 16:45:33 -07:00
Tamer Ahmed
826aaf51f6 [hostcfgd] Fix Boolean String Evaluation (#5248)
New attribute 'has_timer' introduced to init_cfg.json does not evaluate
as Bool, rather it evaluates as string. This PR fixes this issue. Also,
this PR fixes an issue when there is system config unit (snmp, telemetry) that
has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings
in the [Install] section. In the latter case, the .service should not be enabled.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-27 08:04:29 -07:00
Tamer Ahmed
dd3e7a6fa8 [hostcfgd] Handle Both Service And Timer Units (#5228)
Commit e484ae9dd introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-22 09:29:54 -07:00
Tamer Ahmed
9decadfed2 [services] Fix Delay Start of SNMP And Telemetry (#5211)
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-20 16:26:08 -07:00
abdosi
f785a0a270 [caclmgrd] Add support for multi-ASIC platforms (#5022)
* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
 1) Moved from using blocking listen() on Config DB to the select() model
 via python-swsscommon since we have to wait on event from multiple
 config db's
 2) Since  python-swsscommon is not available on host added libswsscommon and python-swsscommon
    and dependent packages in the base image (host enviroment)
 3) Made iptables programmed in all namespace using ip netns exec

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Fix Comments

* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Added more comments on logic.

* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.

* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Verified with swsscommon package. Fix issue for single asic platforms.

* Moved to new python package

* Address Review Comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.
2020-08-20 16:01:12 -07:00
Joe LeVeque
309a098b21
[201911][Python] Migrate applications/scripts to import sonic-py-common package (#5132)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added to the 201911 branch in https://github.com/Azure/sonic-buildimage/pull/5063
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is a step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
2020-08-13 16:35:53 -07:00
lguohan
78c803851c [build]: combine feature and container feature table (#5081)
1. remove container feature table
2. do not generate feature entry if the feature is not included
   in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities

* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-09 11:55:40 -07:00
Sujin Kang
ff6cb6c402 Add disabling HW watchdog during boot for fast-reboot and warm-reboot (#4927)
* Add disabling HW watchdog during boot for fast-reboot and warm-reboot case

* typo
2020-08-09 11:25:31 -07:00
rkdevi27
5ddfc13a75 [baseimage]: /host unmount timeout issue during reboot. (#5032)
Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket

Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.
2020-08-09 10:38:33 -07:00
rkdevi27
652aa3b072 [baseimage]: /host unmount failed in VM during reboot (#4865)
Added a check further to make the services to stop appropriately before unmount.

Fix #4651
2020-08-09 10:37:12 -07:00
rkdevi27
f1bbda19f0 Fix "/host unmount failure" during reboot (#4558) 2020-08-09 10:34:02 -07:00
Joe LeVeque
c96c3cd311 [caclmgrd] Always restart service upon process termination (#5065) 2020-07-31 17:23:48 -07:00
madhanmellanox
130aeb4cc1 [caclmgrd] Log error message if IPv4 ACL table contains IPv6 rule and vice-versa (#4498)
* Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table

* Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled

* Addressed code review comment to replace duplicate code with already existing functions

* removed raising Exception when rule conflict in Control plane ACLs are found

* added code to remove the rule_props if it is conflicting ACL table versioning rule

* addressed review comment to add ignoring rule in the error statement

Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-07-26 11:16:30 -07:00
Joe LeVeque
4a2db8e216 [caclmgrd] remove default DROP rule on FORWARD chain (#5034) 2020-07-26 11:07:42 -07:00
Joe LeVeque
3f3fcd3253 [caclmgrd] Filter DHCP packets based on dest port only (#4995) 2020-07-21 10:13:17 +00:00
Joe LeVeque
52e45e823e
[201911][sudoers] Add sonic_installer list to read-only commands (#4997)
`sonic_installer list` is a read-only command. Specify it as such in the sudoers file.

This will also ensure the new `show boot` command, which calls `sudo sonic_installer list` under the hood doesn't fail due to permissions.
2020-07-17 20:13:42 -07:00
Joe LeVeque
0559b7d3b6 [caclmgrd] Improve code reuse (#4931)
Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.
2020-07-11 09:48:10 -07:00
abdosi
4869fa7173 [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838)
* [sonic-buildimage] Changes to make network specific sysctl
common for both host and docker namespace (in multi-npu).

This change is triggered with issue found in multi-npu platforms
where in docker namespace
net.ipv6.conf.all.forwarding was 0 (should be 1) because of
which RS/RA message were triggered and link-local router were learnt.

Beside this there were some other sysctl.net.ipv6* params whose value
in docker namespace is not same as host namespace.

So to make we are always in sync in host and docker namespace
created common file that list all sysctl.net.* params and used
both by host and docker namespace. Any change will get applied
to both namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments and made sure to invoke augtool
only one and do string concatenation of all set commands

* Address Review Comments.
2020-07-05 15:32:30 -07:00
arlakshm
b6b1f3fac8 syslog changes Multi ASIC platforms (#4738)
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-07-05 15:19:22 -07:00
Joe LeVeque
0768bf7733 [hostcfgd] Synchronize all feature statuses once upon start (#4714)
- Ensure all features (services) are in the configured state when hostcfgd starts
- Better functionalization of code
- Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.
2020-06-28 07:28:33 -07:00