Commit Graph

800 Commits

Author SHA1 Message Date
Joe LeVeque
9e7e092610
[Monit process_checker] Convert to Python 3 (#5836)
Convert process_checker script to Python 3
2020-11-07 12:46:23 -08:00
lguohan
e6796da141
[init_cfg.json.j2]: only enable gbsyncd feature for vs platform (#5815)
currently only vs platform has gdbsyncd feature built

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-07 00:46:18 -08:00
Stepan Blyshchak
9bc693ce6e
[hostcfgd] If feature state entry not in the cache, add a default state (#5777)
Our use case is to register new features in runtime. The previous change which introduced the cache broke this capability and caused hostcfgd crash.

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2020-11-06 10:24:31 -08:00
Joe LeVeque
13ff7b38d5
[docker-wait-any] Convert to Python 3, install dependency in host OS (#5784)
- Convert docker-wait-any script to Python 3
- Install Python 3 Docker Engine API in host OS
2020-11-05 11:23:00 -08:00
Joe LeVeque
d8045987a6
[core_uploader.py] Convert to Python 3; Use logger from sonic-py-common for uniform logging (#5790)
- Convert core_uploader.py script to Python 3
- Use logger from sonic-py-common for uniform logging
- Reorganize imports alphabetically per PEP8 standard
- Two blank lines precede functions per PEP8 standard
- Remove unnecessary global variable declarations
2020-11-05 11:19:26 -08:00
Joe LeVeque
522a071ffb
[core_cleanup.py] Convert to Python 3; Fix bug; Improve code reuse (#5781)
- Convert to Python 3
- Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` instead
- Remove locally-define logging functions; use Logger class from sonic-py-common instead
2020-11-05 10:01:12 -08:00
Joe LeVeque
d3262d10f7
[generate_asic_config_checksum.py] Convert to Python 3 (#5783)
- Convert script to Python 3
    - Need to open file in binary mode before hashing due to new string data type in Python 3 being unicode by default. This should probably have been done regardless.
- Reorganize imports alphabetically
- When running the script, don't explicitly call `python`. Instead let the program loader use the interpreter specified in the shebang (which is now `python3`).
2020-11-04 15:06:44 -08:00
Lawrence Lee
10ab46f7a0
Revert "[docker-base]: Rate limit priority INFO and lower in syslog" (#5763)
* This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.
2020-11-02 08:49:40 -08:00
Blueve
698b5544c9
[openssh] Introduce custom openssh-server package for supporting reverse console SSH (#5717)
* Build and install openssh from source
* Copy openssh deb package to dest folder
* Update make rule
* Update sonic debian extension
* Append empty line before EOF
* Update openssh patch
* Add openssh-server to base image dependency
* Fix indent type
* Fix comments
* Use commit id instead of tag id and add comment

Signed-off-by: Jing Kan jika@microsoft.com
2020-11-02 10:31:15 +08:00
lguohan
c8a00eda95
[mgmt ip]: mvrf ip rule priority change to 32765 (#5754)
Fix Azure/SONiC#551

When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. 

This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.

Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".

This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.

This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.

Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
2020-10-31 20:45:59 -07:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Renuka Manavalan
8d8aadb615
Load config after subscribe (#5740)
- Why I did it
The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order.

- How I did it
Do a load after after updating all feature states.

- How to verify it
Not a easy one
Have a script that
restart hostcfgd
sleep 2s
run redis-cli/config command to update AAA/TACACS table

Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute.

- When it repro:
The updates will not reflect in /etc/pam.d/common-auth-sonic
2020-10-31 16:38:32 -07:00
Joe LeVeque
6333bb73b0
Explicitly call pip2 rather than pip in locations where both pip2 and pip3 are installed (#5747)
As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias.

Also some other pip-related cleanup
2020-10-30 09:43:14 -07:00
Joe LeVeque
e111204206
[caclmgrd] Convert to Python 3; Add to sonic-host-services package (#5739)
To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported, convert caclmgrd to Python 3 and add to sonic-host-services package
2020-10-29 16:29:12 -07:00
Shi Su
5ee5c13f32
Enable synchronous mode by default and add in minigraph parser (#5735) 2020-10-29 09:15:12 -07:00
judyjoseph
6088bd59de
[multi-ASIC] BGP internal neighbor table support (#5520)
* Initial commit for BGP internal neighbor table support.
  > Add new template named "internal" for the internal BGP sessions
  > Add a new table in database "BGP_INTERNAL_NEIGHBOR"
  > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"

* Changes in template generation tests with the introduction of internal neighbor template files.
2020-10-28 16:41:27 -07:00
lguohan
07748a939f
[gbsyncd]: add gbsyncd to FEATURE table (#5683)
remove syncd from critical process list because
gbsyncd process will exit for platform without
gearbox.

closes #5623

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-10-27 11:40:23 -07:00
bingwang-ms
36c52cca2b
Fix 'NoSuchProcess' exception in process_checker (#5716)
The psutil library used in process_checker create a cache for each
process when calling process_iter. So, there is some possibility that
one process exists when calling process_iter, but not exists when
calling cmdline, which will raise a NoSuchProcess exception. This commit
fix the issue.

Signed-off-by: bingwang <bingwang@microsoft.com>
2020-10-27 09:25:35 +08:00
Joe LeVeque
9e34003136
[sonic-config-engine] Clean up dependencies, pin versions; install Python 3 package in Buster container (#5656)
To clean up the image build procedure, and let setuptools/pip[3] implicitly install Python dependencies. Also use ipaddress package instead of ipaddr.
2020-10-26 13:48:50 -07:00
Shi Su
67408c85aa
[synchronous-mode] Add template file for synchronous mode (#5644)
The orchagent and syncd need to have the same default synchronous mode configuration. This PR adds a template file to translate the default value in CONFIG_DB (empty field) to an explicit mode so that the orchagent and syncd could have the same default mode.
2020-10-23 13:08:35 -07:00
Joe LeVeque
3a4435eb53
Add sonic-host-services and sonic-host-services-data packages (#5694)
**- Why I did it**

Install all host services and their data files in package format rather than file-by-file

**- How I did it**

- Create sonic-host-services Python wheel package, currently including procdockerstatsd
  - Also add the framework for unit tests by adding one simple procdockerstatsd test case
- Create sonic-host-services-data Debian package which is responsible for installing the related systemd unit files to control the services in the Python wheel. This package will also be responsible for installing any Jinja2 templates and other data files needed by the host services.
2020-10-23 09:52:29 -07:00
judyjoseph
ace7f24cba
[docker-teamd]: Add teamd as a depedent service to swss (#5628)
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.

**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service   

**- How to verify it**

Verified the following scenario's with the following testbed 
VM1 ----------------------------[DUT 6100] -----------------------VM2,  ping traffic continuous between VMs

1. Stop teamd docker alone  
      >  swss, syncd dockers seen going away
      >  The LAG reference count error messages seen for a while till swss docker stops.
      >  Dockers back up.

2. Enable WR mode for teamd. Stop teamd docker alone  
      >  swss, syncd dockers not removed.
      >  The LAG reference count error messages not seen
      >  Repeated stop teamd docker test - same result, no effect on swss/syncd.

3. Stop swss docker. 
      >  swss, teamd, syncd goes off - dockers comes back correctly, interfaces up

4. Enable WR mode for swss . Stop swss docker 
      >  swss goes off not affecting syncd/teamd dockers.

5. Config reload 
      > no reference counter error seen, dockers comes back correctly, with interfaces up

6. Warm reboot, observations below
	 > swss docker goes off first 
	 > teamd + syncd goes off to the end of WR process.
 	 > dockers comes back up fine.
	 > ping traffic between VM's was NOT HIT

7. Fast reboot, observations below
	 > teamd goes off first ( **confirmed swss don't exit here** )
	 > swss goes off next 
	 > syncd goes away at the end of the FR process
	 > dockers comes back up fine.
	 > there is a traffic HIT as per fast-reboot

8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
2020-10-23 00:41:16 -07:00
yozhao101
af97e23686
[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689)
**- Why I did it**
If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service.
The snmp.timer service will first stop snmp.service and later start snmp.service. 

In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was
updated.

**- How I did it**
When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be
cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a
container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached
value will also be updated. Otherwise, nothing will be done.

**- How to verify it**
We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands  `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-10-22 20:01:07 -07:00
pavel-shirshov
c94f93f046
[bgpcfgd]: Dynamic BBR support (#5626)
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding  `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry 

**- How I did it**
I introduced a new CONFIG_DB entry:
 - table name: "BGP_BBR"
 - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
 - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```

Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$
```

Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```


Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%
```

**- Which release branch to backport (provide reason below if selected)**

- [ ] 201811
- [x] 201911
- [x] 202006
2020-10-22 11:04:21 -07:00
BrynXu
29928c93a1
[chassis]: Use correct path for chassisdb.conf file (#5632)
use correct chassisdb.conf path while bringing up chassis_db service on VoQ modular switch.chassis_db service on VoQ modular switch.

resolves #5631

Signed-off-by: Honggang Xu <hxu@arista.com>
2020-10-21 01:40:04 -07:00
Lawrence Lee
207587d97c
[docker-base]: Rate limit priority INFO and lower in syslog (#5666)
There is currently a bug where messages from swss with priority lower than the current log level are still being counted against the syslog rate limiting threshhold. This leads to rate-limiting in syslog when the rate-limiting conditions have not been met, which causes several sonic-mgmt tests to fail since they are dependent on LogAnalyzer. It also omits potentially useful information from the syslog. Only rate-limiting messages of level INFO and lower allows these tests to pass successfully.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2020-10-20 11:52:46 -07:00
pavel-shirshov
d19d1dd569
[bgpcfgd]: Change prefix-list generation for "Allow prefix" feature (#5639)
**- Why I did it**
I was asked to change "Allow list" prefix-list generation rule.
Previously we generated the rules using following method:
``` 
For each {prefix}/{masklen} we would generate the prefix-rule
permit {prefix}/{masklen} ge {masklen}+1
Example:
Prefix 1.2.3.4/24 would have following prefix-list entry generated
permit 1.2.3.4/24 ge 23
```
But we discovered the old rule doesn't work for all cases we have.

So we introduced the new rule:
```
For ipv4 entry,  
For mask  < 32 , we will add ‘le 32’ to cover all  prefix masks to be sent by T0  
For mask =32 , we will not add any ‘le mask’ 
For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0  
For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0 
For mask = 128 , we will not add any ‘le mask’ 
```    

**- How I did it**
I change prefix-list entry generation function. Also I introduced a test for the changed function.

**- How to verify it**
1. Build an image and put it on your dut.

2. Create a file test_schema.conf with the test configuration
```
{
    "BGP_ALLOWED_PREFIXES": {
        "DEPLOYMENT_ID|0|1010:1010": {
            "prefixes_v4": [
                "10.20.0.0/16",
                "10.50.1.0/29"
            ],
            "prefixes_v6": [
                "fc01:10::/64",
                "fc02:20::/64"
            ]
        },
        "DEPLOYMENT_ID|0": {
            "prefixes_v4": [
                "10.20.0.0/16",
                "10.50.1.0/29"
            ],
            "prefixes_v6": [
                "fc01:10::/64",
                "fc02:20::/64"
            ]
        }
    }
}
```

3. Apply the configuration by command 
```
sonic-cfggen -j test_schema.conf --write-to-db
```

4. Check that your bgp configuration has following prefix-list entries:
```
admin@str-s6100-acs-1:~$ show runningconfiguration bgp | grep PL_ALLOW
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32
ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128
ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128

``` 

Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
2020-10-20 00:38:09 -07:00
Joe LeVeque
edf4971b16
[caclmgrd] Prevent unnecessary iptables updates (#5312)
When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied.

This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.
2020-10-19 11:11:30 -07:00
Joe LeVeque
678b66359d
[procdockerstatsd] Convert to Python 3 (#5657)
Make procdockerstatsd Python 3-compliant and set interpreter to python3 in shebang. Also some other cleanup to improve code reuse.
2020-10-19 09:46:02 -07:00
Rajkumar-Marvell
5708e32ccf
Set sock rx Buf size to 3MB. (#5566)
* Set sock rx Buf size to 3MB.
2020-10-15 14:40:59 -07:00
BrynXu
a2e3d2fcea
[ChassisDB]: bring up ChassisDB service (#5283)
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD

Signed-off-by: Honggang Xu <hxu@arista.com>

**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md). 

**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.

**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.

Signed-off-by: Honggang Xu <hxu@arista.com>
2020-10-14 15:15:24 -07:00
Joe LeVeque
88c1d66c27
[python-click] No longer build our own package, let pip/setuptools install vanilla (#5549)
We were building our own python-click package because we needed features/bug fixes available as of version 7.0.0, but the most recent version available from Debian was in the 6.x range.

"Click" is needed for building/testing and installing sonic-utilities. Now that we are building sonic-utilities as a wheel, with Click specified as a dependency in the setup.py file, setuptools will install a more recent version of Click in the sonic-slave-buster container when building the package, and pip will install a more recent version of Click in the host OS of SONiC when installing the sonic-utilities package. Also, we don't need to worry about installing the Python 2 or 3 version of the package, as the proper one will be installed as necessary.
2020-10-14 10:16:35 -07:00
abdosi
9094e2176f
Optimze ACL Table/Rule notification handling (#5621)
* Optimze ACL Table/Rule notifcation handling
to loop pop() until empty to consume all the data in a batch

This wau we prevent multiple call to iptable updates

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address review comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-14 08:05:33 -07:00
Junchao-Mellanox
1c97a03b81
[system-health] Add support for monitoring system health (#4835)
* system health first commit

* system health daemon first commit

* Finish healthd

* Changes due to lower layer logic change

* Get ASIC temperature from TEMPERATURE_INFO table

* Add system health make rule and service files

* fix bugs found during manual test

* Change make file to install system-health library to host

* Set system LED to blink on bootup time

* Caught exceptions in system health checker to make it more robust

* fix issue that fan/psu presence will always be true

* fix issue for external checker

* move system-health service to right after rc-local service

* Set system-health service start after database service

* Get system up time via /proc/uptime

* Provide more information in stat for CLI to use

* fix typo

* Set default category to External for external checker

* If external checker reported OK, save it to stat too

* Trim string for external checker output

* fix issue: PSU voltage check always return OK

* Add unit test cases for system health library

* Fix LGTM warnings

* fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon

* Remove boot_timeout configuration because it will get from monit config file

* Fix argument miss

* fix unit test failure

* fix issue: summary status is not correct

* Fix format issues found in code review

* rename th to threshold to make it clearer

* Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead

* Fix unit test failure

* Fix LGTM alert

* Fix LGTM alert

* Fix review comments

* Fix review comment

* 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker

* Ignore check for unknown service type

* Fix unit test issue

* Rename user define checker to user defined checker

* Rename user_define_checkers to user_defined_checkers for configuration file

* Renmae file user_define_checker.py -> user_defined_checker.py

* Fix typo

* Adjust import order for config.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/health_checker/hardware_checker.py

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import order for src/system-health/scripts/healthd

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>

* Adjust import orders in src/system-health/tests/test_system_health.py

* Fix typo

* Add new line after import

* If system health configuration file not exist, healthd should exit

* Fix indent and enable pytest coverage

* Fix typo

* Fix typo

* Remove global logger and use log functions inherited from super class

* Change info level logger to notice level

Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
2020-10-12 11:12:49 +03:00
abdosi
01fceb6f79
Optimized caclmgrd Notification handling. Previously (#5560)
any event happening on ACL Rule Table (eg DATAACL rules
programmed) caused control plane default action to be triggered.

Now Control Plance ACTION will be trigger only

a) ACL Rule beloging to Control ACL Table

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-08 11:31:09 -07:00
jon-nokia
d03de95e81
[build]: fix pip installation for sonic utilities whl package (#5498)
The problem was proxy was missing on "pip install". This is to fix the build behind the proxy.

Signed-off-by: Jon Goldberg <jon.goldberg@nokia.com>
2020-10-06 15:47:50 -07:00
Ying Xie
ec0153008a
[rc.local] separate configuration migration and grub installation logic (#5528)
To address issue #5525

Explicitly control the grub installation requirement when it is needed.
We have scenario where configuration migration happened but grub
installation is not required.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-10-03 23:00:39 -07:00
pavel-shirshov
ffae82f8be
[bgp] Add 'allow list' manager feature (#5513)
implements a new feature: "BGP Allow list."

This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
2020-10-02 10:06:04 -07:00
anish-n
e15e6a8313
[config-reload]: Add logic to clean up FG_ROUTE state db table during reload (#5518)
Cleanup FG_ROUTE state db table during reload
2020-10-02 09:25:29 -07:00
Tamer Ahmed
110f7b7817 [cfggen] Build Python 2 And Python 3 Wheel Packages
This builds Python 2&3 wheel packages for sonic-cfggen script.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-30 07:07:43 -07:00
Volodymyr Boiko
d71a4efe3b
[sonic-platform-common] Install Python 3 package in host OS and PMon container (#5461)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2020-09-29 13:57:54 -07:00
Guohan Lu
e412338743 Revert "[bgp] Add 'allow list' manager feature (#5309)"
This reverts commit 6eed0820c8.
2020-09-28 22:00:29 -07:00
pavel-shirshov
6eed0820c8
[bgp] Add 'allow list' manager feature (#5309)
implements a new feature: "BGP Allow list."

This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
2020-09-27 10:47:43 -07:00
judyjoseph
4006ce711f
[Multi-Asic] Forward SNMP requests received on front panel interface to SNMP agent in host. (#5420)
* [Multi-Asic] Forward SNMP requests destined to loopback IP, and coming in through the front panel interface
             present in the network namespace, to SNMP agent running in the linux host.

* Updates based on comments

* Further updates in docker_image_ctl.j2 and caclmgrd

* Change the variable for net config file.

* Updated the comments in the code.

* No need to clean up the exising NAT rules if present, which could be created by some other process.

* Delete our rule first and add it back, to take care of caclmgrd restart.
Another benefit is that we delete only our rules, rather than earlier approach of "iptables -F" which cleans up all rules.

* Keeping the original logic to clean the NAT entries, to revist when NAT feature added in namespace.

* Missing updates to log_info call.
2020-09-26 12:14:30 -07:00
Syd Logan
0311a4a037
Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851)
* buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature

* scripts and configuration needed to support a second syncd docker (physyncd)
* physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device
* support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY).

HLD is located at b817a12fd8/doc/gearbox/gearbox_mgr_design.md

**- Why I did it**

This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based
on multi-switch support in sonic-sairedis.

**- How I did it**

Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point):

https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged)
https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged)
https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems
https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib

**- How to verify it**

In a vslib build:

root@sonic:/home/admin# show gearbox interfaces status
  PHY Id    Interface        MAC Lanes    MAC Lane Speed        PHY Lanes    PHY Lane Speed    Line Lanes    Line Lane Speed    Oper    Admin
--------  -----------  ---------------  ----------------  ---------------  ----------------  ------------  -----------------  ------  -------
       1   Ethernet48  121,122,123,124               25G  200,201,202,203               25G       204,205                50G    down     down
       1   Ethernet49  125,126,127,128               25G  206,207,208,209               25G       210,211                50G    down     down
       1   Ethernet50      69,70,71,72               25G  212,213,214,215               25G           216               100G    down     down

In addition, docker ps | grep phy should show a physyncd docker running.

  Signed-off-by: syd.logan@broadcom.com
2020-09-25 08:32:44 -07:00
bingwang-ms
584e2223dc
Fix exception when attempting to write a datetime to db (#5467)
redis-py 3.0 used in master branch only accepts user data as bytes,
strings or numbers (ints, longs and floats). Attempting to specify a key
or a value as any other type will raise a DataError exception.
This PR address the issue bt converting datetime to str
2020-09-25 20:19:18 +08:00
yozhao101
13cec4c486
[Monit] Unmonitor the processes in containers which are disabled. (#5153)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:28:28 -07:00
Venkatesan Mahalingam
418e437d79
[caclmgrd] Add support to allow/deny any IP/IPv6 protocol packets coming to CPU based on source IP (#4591)
Add support to allow/deny packets coming to CPU based on source IP, regardless of destination port
2020-09-23 09:55:09 -07:00
abdosi
0483255e82
Fix the build issue when port2cable lenth define in (#5437)
buffer_default_*.j2 because of which internal cable length never gets
define and cause failure in test case test_multinpu_cfggen.py

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-23 08:07:09 -07:00
abdosi
75e4258508
Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358)
* Enhanced Feature Table state enable/disbale for multi-asic platforms.
In Multi-asic for some features we can service per asic so we need to
get list of all services.

Also updated logic to return if any one of systemctl command return failure
and make sure syslog of feature getting enable/disable only come when
all commads are sucessful.

Moved the service list get api from sonic-util to sonic-py-common

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Make sure to retun None for both service list in case of error.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Return empty list as fail condition

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Address Review Comments.

Made init_cfg.json.j2 knowledegable of Feature
service is global scope or per asic scope

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

* Fix merge conflict

* Address Review Comment.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-22 08:34:02 -07:00
abdosi
a7f4bfa96d
Enabling ipv6 support on docker container network. This is needed (#5418)
for ipv6 communication between container and host in multi-asic
platforms. Address is assign is private address space of fd::/80
with prefix len selected as 80 so that last 48 bits can be
container mac address and and you prevent NDP neighbor cache
invalidation issues in the Docker layer.

Ref: https://docs.docker.com/config/daemon/ipv6/
Ref:https://medium.com/@skleeschulte/how-to-enable-ipv6-for-docker-containers-on-ubuntu-18-04-c68394a219a2

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-22 08:32:17 -07:00
Volodymyr Boiko
97aee026de
[logrotate] create separate logrotate.d config for update-alternatives (#5382)
To fix the following error when running
`logrotate /etc/logrotate.conf` :
```
error: dpkg:10 duplicate log entry for /var/log/alternatives.log
error: found error in file dpkg, skipping
```
update-alternatives is provided with dedicated logrotate config in newer dpkg package versions (probably starting from buster)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2020-09-22 01:23:42 -07:00
Joe LeVeque
3987cbd80a
[sonic-utilities] Build and install as a Python wheel package (#5409)
We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3.

Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package.

**- How I did it**
- Build and install sonic-utilities as a Python package
- Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel
- Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications
- Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable

Submodule updates:

* src/sonic-utilities aa27dd9...2244d7b (5):
  > Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122)
  > [consutil] Display remote device name in show command (#1120)
  > [vrf] fix check state_db error when vrf moving (#1119)
  > [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117)
  > Update to make config load/reload backward compatible. (#1115)

* src/sonic-ztp dd025bc...911d622 (1):
  > Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)
2020-09-20 20:16:42 -07:00
Tamer Ahmed
2de3afaf35
[swss] Enhance ARP Update to Call Sonic Cfggen Once (#5398)
This PR limited the number of calls to sonic-cfggen to one call
per iteration instead of current 3 calls per iteration.

The PR also installs jq on host for future scripts if needed.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-18 18:44:23 -07:00
abdosi
d12e9cbbc6
[Multi-Asic] Fix for multi-asic where we should allow docker local (#5364)
communication on docker eth0 ip . Without this TCP Connection to Redis
does not happen in namespace.

Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>

Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>
2020-09-16 11:32:35 -07:00
Stepan Blyshchak
6de9390bb0
[build] Add a parameter to specify sonic version during build (#5278)
Introduced a new build parameter 'SONIC_IMAGE_VERSION' that allows build
system users to build SONiC image with a specific version string. If
'SONIC_IMAGE_VERSION' was not passed by the user, SONIC_IMAGE_VERSION will be
set to the output of functions.sh:sonic_get_version function.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2020-09-16 10:47:26 -07:00
Joe LeVeque
c7186a2d39
[process-reboot-cause] Use Logger class from sonic-py-common package (#5384)
Eliminate duplicate logging code by importing Logger class from sonic-py-common package.
2020-09-16 10:35:19 -07:00
Samuel Angebault
9bf4b0a93e
[baseimage]: Change the loopback mask from /8 to /16 (#5353)
As per the VOQ HLDs, internal networking between the linecards and supervisor is required within a chassis.
Allocating 127.X/16 subnets for private communication within a chassis is a good candidate.
It doesn't require any external IP allocation as well as ensure that the traffic will not leave the chassis.

References:
https://github.com/Azure/SONiC/pull/622
https://github.com/Azure/SONiC/pull/639

**- How I did it**

Changed the `interfaces.j2` file to add `127.0.0.1/16` as the `lo` ip address.
Then once the interface is up, the post-up command removes the `127.0.0.1/8` ip address.
The order in which the netmask change is made matters for `127.0.0.1` to be reachable at all times.

**- How to verify it**

```
root@sonic:~# ip address show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/16 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
```

Co-authored-by: Baptiste Covolato <baptiste@arista.com>
2020-09-15 15:29:48 -07:00
Petro Bratash
558ec53aa6
Fix bug with pcie-check.service (#5368)
* Change STATE_DB key (PCIE_STATUS|PCIE_DEVICES -> PCIE_DEVICES)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* [pcie-check.service] Add dependency on database.service

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
2020-09-15 15:21:31 -07:00
Joe LeVeque
1ac146dd97
[caclmgrd] Inherit DaemonBase class from sonic-py-common package (#5373)
Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.
2020-09-15 13:34:41 -07:00
Joe LeVeque
3a901eeae0
[procdockerstatsd] Inherit DaemonBase class from sonic-py-common package (#5372)
Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.
2020-09-14 16:36:37 -07:00
noaOrMlnx
353003f6ee
Change update_feature_state call to pass False as default if feature has no 'has_timer' field (#5260)
* Pass False as default if feature has no timer field

* Update hostcfgd to fit the new changes merged

New changes can be found in PR:5248
2020-09-14 11:28:24 -07:00
Samuel Angebault
0b4191fe2a
[Arista] Updating driver submodules (#5352)
- Merge chassis codebase upstream
 - Add support for Otterlake supervisor
 - Add support for NorthFace and Camp chassis
 - Add support for Eldridge, Dragonfly and Brooks fabrics
 - Add support for Clearwater2 and Clearwater2Ms linecards
 - Add new arista Cli to power on/off cards
 - Add new arista show Cli to inspect supervisor, chassis, fabrics and linecards
2020-09-10 01:34:38 -07:00
shi-su
339cfbf9af
Remove the configuration of synchronous mode from init_cfg.json (#5308)
Remove the configuration of synchronous mode from init_cfg.json
2020-09-10 01:26:10 -07:00
Blueve
01fb32fa08
[conf] append nos-config-part for s6100 (#5234)
* [conf] append nos-config-part for s6100

* modify rc.local

Signed-off-by: Guohan Lu <lguohan@gmail.com>

* Update rc.local

Co-authored-by: Blueve <jika@microsoft.com>
Co-authored-by: Guohan Lu <lguohan@gmail.com>
Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
2020-09-08 12:29:02 -07:00
arheneus@marvell.com
f136fd0623
[ebtbles] Replace binary config file to text config file for ebtables (#5252)
Issue: Binary ebtables config file is CPU arch dependent
Fix: Load the text config during firsttime boot and
     Generate the binary persistent atomic file

Signed-off-by: Antony Rheneus <arheneus@marvell.com>
2020-09-03 17:27:07 -07:00
Tamer Ahmed
fdb9d028e9
[redis] Add redis Group And Grant Read/Write Access to Members (#5289)
sonic-cfggen is now using Unix Domain Socket for Redis DB. The socket
is created using root account. Subsequently, services that are started
as admin fails to start. This PR creates redis group and add admin
user to redis group. It also grants read/write access on redis.sock
for redis group members.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-02 23:40:22 -07:00
abdosi
dd908c2ee2
[sonic-swsscommon] submodule update with commit's (#5300)
[schema] Make schema header support C project (#373)
Removed DB specific get api's from Selectable class (#378)

With the change as part of #378 caclmgrd need to be updated
to use new client side Get API to access namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-09-02 18:09:03 -07:00
Joe LeVeque
07b9d7f44d
[pcie-check] Make pcie-check.sh executable (#5256)
The pcie-check.sh script was added in https://github.com/Azure/sonic-buildimage/pull/4771, but was not given executable permission. Therefore, we would see messages like:

```
Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied
Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied
Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'.
```
2020-08-29 10:29:42 -07:00
Stepan Blyshchak
b31050d60e
[services][mgmt-framework] delay mgmt-framework service on boot (#5226)
management framework provides management plane services like rest and
CLI which is not needed right after boot, instead by delaying this
service we give some more CPU for data plane and control plane services
on fast/warm boot.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2020-08-27 21:53:58 +03:00
Tamer Ahmed
7d3ec60b1f
[hostcfgd] Fix Boolean String Evaluation (#5248)
New attribute 'has_timer' introduced to init_cfg.json does not evaluate
as Bool, rather it evaluates as string. This PR fixes this issue. Also,
this PR fixes an issue when there is system config unit (snmp, telemetry) that
has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings
in the [Install] section. In the latter case, the .service should not be enabled.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-27 06:50:03 -07:00
shi-su
f3feb56c8a
Add switch for synchronous mode (#5237)
Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1.  Configure mode while building an image
    `make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running 
    Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
    Restart swss with `systemctl restart swss`
2020-08-24 14:04:10 -07:00
Baptiste Covolato
cd486a82a4
[arista/aboot]: Zero out 1st MB before repartitioning (#5220)
The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97f1e. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.

Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.

Signed-off-by: Baptiste Covolato <baptiste@arista.com>
2020-08-22 18:46:30 -07:00
nirenjan
bb57ccecd4
[sonic-host-service]: Add SONiC Host Services infrastructure (#4840)
- Why I did it

When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality:

Image management
Configuration save and load
ZTP enable/disable and status
Show tech support
- How I did it

The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor.

This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality.

- How to verify it

- Description for the changelog

Add SONiC Host Service for container to execute select commands in host

Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>
2020-08-21 15:34:14 -07:00
Tamer Ahmed
90cbb4d78c
[hostcfgd] Handle Both Service And Timer Units (#5228)
Commit e484ae9dd introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-21 09:51:41 -07:00
abdosi
1a805e7409
Fix unwanted python exception in syslog during database container (#5227)
startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-21 07:33:19 -07:00
abdosi
74d8b4a6be
[caclmgrd] Add support for multi-ASIC platforms (#5022)
* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
 1) Moved from using blocking listen() on Config DB to the select() model
 via python-swsscommon since we have to wait on event from multiple
 config db's
 2) Since  python-swsscommon is not available on host added libswsscommon and python-swsscommon
    and dependent packages in the base image (host enviroment)
 3) Made iptables programmed in all namespace using ip netns exec

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Fix Comments

* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Added more comments on logic.

* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.

* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Verified with swsscommon package. Fix issue for single asic platforms.

* Moved to new python package

* Address Review Comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.
2020-08-20 15:11:42 -07:00
Tamer Ahmed
e484ae9dda
[services] Fix Delay Start of SNMP And Telemetry (#5211)
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-19 19:27:59 -07:00
Tamer Ahmed
dfc0617283
[interfaces] Reduce Calls to SONiC Cfggen (#5174)
Calls to sonic-cfggen is CPU expensive. This PR reduces calls to
sonic-cfggen to one call during startup when running interfaces-
config.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-08-17 15:46:52 -07:00
Vaibhav Hemant Dixit
9fdbaf0196
BGP Service script path and error fix (#5183)
* BGP service script path update and error fix

Co-authored-by: Vaibhav Hemant Dixit <vadixit@microsoft.com>
2020-08-15 12:09:10 -07:00
Vaibhav Hemant Dixit
b193af2d7f
Make RADV service script executable (#5186)
Co-authored-by: Vaibhav Hemant Dixit <vadixit@microsoft.com>
2020-08-15 12:08:09 -07:00
Prince Sunny
3fee094760
Enable restapi, update sonic-restapi (#5169)
* Enable restapi if included in image
* [Submodule update] sonic-restapi
2020-08-13 11:29:49 -07:00
Vaibhav Hemant Dixit
31d7f27b4b
Move RADV fastboot handling to a service script (#5108)
* New /usr/local/bin/ script to start/stop radv service
2020-08-11 13:13:13 -07:00
Vaibhav Hemant Dixit
ca462d669e
Fix fast-reboot handling for swss script (#5070)
* Fast and warm reboot checks for SWSS start and stop path
2020-08-10 14:48:30 -07:00
Joe LeVeque
2b5e418e2e
Remove sonic-daemon-base package (#5131)
sonic-daemon-base package has been deprecated in favor of the sonic-py-common package. All related functionality has been moved there.
2020-08-09 21:27:36 -07:00
isabelmsft
19a3452ddc
[Kubernetes Setup] Remove flannel, kube-proxy images (#5098)
Removes installation of kube-proxy (117 MB) and flannel (53 MB) images from Kubernetes-enabled devices. These images are tested to be unnecessary for our use case, as we do not rely on ClusterIPs for Kubernetes Services or a CNI for pod networking.
2020-08-06 18:23:27 -05:00
lguohan
082c26a27d
[build]: combine feature and container feature table (#5081)
1. remove container feature table
2. do not generate feature entry if the feature is not included
   in the image
3. rename ENABLE_* to INCLUDE_* for better clarity
4. rename feature status to feature state
5. [submodule]: update sonic-utilities

* 9700e45 2020-08-03 | [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan]
* c9d3550 2020-08-03 | [tests]: fix drops_group_test failure on second run (#1023) [lguohan]
* dfaae69 2020-08-03 | [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran]
* 216688e 2020-08-02 | [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-05 13:23:12 -07:00
Renuka Manavalan
312771dc3e
[monit] Periodically monitor route consistency (#5085)
* Add route_check to mont.

* Switched to units of cycles per comments

* Added comments per Joe's comments.

* Added more comments per Royal's comments.
2020-08-04 10:33:13 -07:00
Joe LeVeque
3b89e5d467
[Python] Migrate applications/scripts to import sonic-py-common package (#5043)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999

Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.
2020-08-03 11:43:12 -07:00
Tamer Ahmed
7872b4e196
[platform] Add Support For Environment Variable File (#5010)
* [platform] Add Support For Environment Variable

This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-07-31 17:59:09 -07:00
abdosi
ec435b955c
Changes to add template support for copp.json. (#5053)
* Changes to add template support for copp.json.
This is needed so that we can install differnt type of
Traps based on Device Role (Tor/Leaf/Mgmt/etc...).

Initial use case is to install DHCP/DHCPv6 tarp only
for tor router.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fixed based on review comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fixed based on review comment.
2020-07-31 14:14:21 -07:00
Nazarii Hnydyn
fe387289b1
[Mellanox] Update MFT to 4.15.0-104 (#5075)
* [Mellanox] Update MFT to 4.15.0-104.
* [Mellanox] Remove build system W/A.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-30 22:59:57 -07:00
BrynXu
311045f01f
[vs]: support virtual-chassis setup in vs docker (#4709)
virtual-chassis test uses multiple vs instances to simulate a
modular switch and a redis-chassis service is required to run on
the vs instance that represents a supervisor card.
This change allows vs docker start redis-chassis service according
to external config file.

**- Why I did it**
To support virtual-chassis setup, so that we can test distributed forwarding feature in virtual sonic environment, see `Distributed forwarding in a VOQ architecture HLD` pull request at https://github.com/Azure/SONiC/pull/622

**- How I did it**
The sonic-vs start.sh is enhanced to start new redis_chassis service if external chassis config file found. The config file doesn't exist in current vs environment, start.sh will behave like before. 

**- How to verify it**
The swss/test still pass. The chassis_db service is verified in virtual-chassis topology and tests which are in following PRs.

Signed-off-by: Honggang Xu <hxu@arista.com>
(cherry picked from commit c1d45cf81ce3238be2dcbccae98c0780944981ce)

Co-authored-by: Honggang Xu <hxu@arista.com>
2020-07-29 14:20:31 -07:00
Joe LeVeque
b2344f6f78
[caclmgrd] Always restart service upon process termination (#5065) 2020-07-29 10:12:38 -07:00
Samuel Angebault
ddf9fdde72
[Arista] Add secure fast-reboot support in boot0 (#4994) 2020-07-27 18:56:25 -07:00
Mahesh Maddikayala
ee4197e9f8
+ Modified buffer config template to include internal ASIC with 5m ca… (#4959)
+ Modified buffer config template to include internal ASIC with 5m cable length
+ added unit test to verify the changes.
2020-07-27 10:57:07 -07:00
Joe LeVeque
c0d1616f89
Introduce sonic-py-common package (#5003)
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.

The package currently includes three modules:

- daemon_base
- device_info
- logger
2020-07-26 23:15:41 -07:00
rkdevi27
26050ffef8
[baseimage]: /host unmount timeout issue during reboot. (#5032)
Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket

Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.
2020-07-25 01:27:58 -07:00
Joe LeVeque
1587889b7a
[caclmgrd] remove default DROP rule on FORWARD chain (#5034) 2020-07-24 11:59:46 -07:00
Kebo Liu
f3091c91a6
[Mellanox] remove code which instructs hw-mgmt to skip mlsw_minimal probing in fast-boot flow (#5011) 2020-07-22 12:21:11 +03:00
Joe LeVeque
43b5832e0c
[sudoers] Add sonic-installer list to read-only commands (#4996)
`sonic-installer list` is a read-only command. Specify it as such in the sudoers file.

This will also ensure the new `show boot` command, which calls `sudo sonic-installer list` under the hood doesn't fail due to permissions.
2020-07-20 11:23:05 -07:00
Stepan Blyshchak
5d753b8d43
[services] remove swss from WantedBy for nat service (#4991)
Otherwise, it may cause issues for warm restarts, warm reboot.
Warm restart of swss will start nat which is not expected for warm
restart. Also it is observed that during warm-reboot script execution
nat container gets started after it was killed. This causes removal of
nat dump generated by nat previously:

A check [ -f /host/warmboot/nat/nat_entries.dump ] || echo "NAT dump
does not exists" was added right before kexec:

```
Fri Jul 17 10:47:16 UTC 2020 Prepare MLNX ASIC to fastfast-reboot:
install new FW if required
Fri Jul 17 10:47:18 UTC 2020 Pausing orchagent ...
Fri Jul 17 10:47:18 UTC 2020 Stopping nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopped nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopping radv ...
Fri Jul 17 10:47:19 UTC 2020 Stopping bgp ...
Fri Jul 17 10:47:19 UTC 2020 Stopped bgp ...
Fri Jul 17 10:47:21 UTC 2020 Initialize pre-shutdown ...
Fri Jul 17 10:47:21 UTC 2020 Requesting pre-shutdown ...
Fri Jul 17 10:47:22 UTC 2020 Waiting for pre-shutdown ...
Fri Jul 17 10:47:24 UTC 2020 Pre-shutdown succeeded ...
Fri Jul 17 10:47:24 UTC 2020 Backing up database ...
Fri Jul 17 10:47:25 UTC 2020 Stopping teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopped teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopping syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopped syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopping all remaining containers ...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Fri Jul 17 10:47:37 UTC 2020 Stopped all remaining containers ...
NAT dump does not exists
Fri Jul 17 10:47:39 UTC 2020 Rebooting with /sbin/kexec -e to
SONiC-OS-201911.140-08245093 ...
```

With this change, executed warm-reboot 10 times without hitting this
issue, while without this change the issue is easily reproducible almost
every warm-reboot run.

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2020-07-19 21:50:26 -07:00
Samuel Angebault
d8a79bc71e
[secureboot] Fix some installation behavior for secureboot (#4980) 2020-07-17 15:07:12 -07:00
Joe LeVeque
d6925499f1
[caclmgrd] Filter DHCP packets based on dest port only (#4995) 2020-07-17 11:16:19 -07:00
yozhao101
83738fca2f
[dockers] Default container autorestart feature to "enabled" for all except database (#4853)
Set the default auto_restart state to "enabled" in init_cfg.json for all containers except database

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-07-15 11:49:14 -07:00
heidinet2007
de51b9e424
BGP warm reboot script to service (#3992)
* [sonic-buildimage] Move BGP warm reboot scripts into BGP service /usr/local/bin

* Revert "[sonic-buildimage] Move BGP warm reboot scripts into BGP service /usr/local/bin"

This reverts commit d16d163fc4.

* [sonic-buildimage] Move BGP warm reboot script to BGP service

* [sonic-buildimage] Move BGP warm reboot script to BGP service

* [sonic-buildimage] Move BGP warm reboot script to BGP service
- access DB correctly

* Address code review comments, also change file mode of bgp.sh (+x)

* Address code review comments, also change the file mode of bgp.sh (+x)

* BGP warm reboot script to service, also handle fast boot as indicated by flag saved in StateDB

* BGP warm reboot script to service, code review comments on space alignment

* BGP warm reboot script to service: remove uncesseary space

* BGP warm reboot script to service: replace tab with space

* Code review comments: -) use new multi-db api -) add ignore error from zebra in case it's not configured

* Integrate with multi-ASIC changes committed recently

Co-authored-by: heidi.ou@alibaba-inc.com <heidi.ou@alibaba-inc.com>
2020-07-15 11:46:23 -07:00
madhanmellanox
ade634090d
[caclmgrd] Log error message if IPv4 ACL table contains IPv6 rule and vice-versa (#4498)
* Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table

* Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled

* Addressed code review comment to replace duplicate code with already existing functions

* removed raising Exception when rule conflict in Control plane ACLs are found

* added code to remove the rule_props if it is conflicting ACL table versioning rule

* addressed review comment to add ignoring rule in the error statement

Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-07-15 20:24:44 +03:00
rkdevi27
df740b3653
[baseimage]: /host unmount failed in VM during reboot (#4865)
Added a check further to make the services to stop appropriately before unmount.

Fix #4651
2020-07-14 15:34:19 -07:00
Sujin Kang
bf45e11d27
Add pcie-check service to check PCIe devices at boot (#4771)
* PCIe Monitor service

* Add rescan to pcie-mon.service when it fails to get all pcie devices

* space

* Clean up

* review comments

* update the pcie status in state db

* update the failed pcie status once at the end

* Update the pcie_status in STATE_DB and rename the service

* Add log to exit the service if the configuration file doesn't exist.

* fix the build failure

* Redo the pcie rescan for pcie-check failed case.

* review comments

* review comments

* review comments
2020-07-13 14:15:09 -07:00
Sujin Kang
b4452edb8a Add disabling HW watchdog during boot for fast-reboot and warm-reboot (#4927)
* Add disabling HW watchdog during boot for fast-reboot and warm-reboot case

* typo
2020-07-12 18:08:52 +00:00
Joe LeVeque
2731571dc9 [caclmgrd] Improve code reuse (#4931)
Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.
2020-07-12 18:08:52 +00:00
arlakshm
a46f4c96e7 Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (#4926)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms
2020-07-12 18:08:52 +00:00
Venkatesan Mahalingam
7d003c3518 [TACACS+]: Add support to specify source address for TACACS+ (#4610)
This pull request was cherry picked from "#1238" to resolve the conflicts.

- Why I did it
Add support to specify source address for TACACS+
- How I did it
Add patches for libpam-tacplus and libnss-tacplus. The patches parse the new option 'src_ip' and store the converted addrinfo. Then the addrinfo is used for TACACS+ connection.
Add a attribute 'src_ip' for table "TACPLUS|global" in configDB
Add some code to adapt to the attribute 'src_ip'.
- How to verify it
Config command for source address PR in sonic-utilities
config tacacs src_ip <ip_address>

- Description for the changelog
Add patches to specify source address for the TACACS+ outgoing packets.

- A picture of a cute animal (not mandatory but encouraged)

**UT logs: **

UT_tacacs_source_intf.txt
2020-07-12 18:08:51 +00:00
abdosi
fc6bcff52b [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838)
* [sonic-buildimage] Changes to make network specific sysctl
common for both host and docker namespace (in multi-npu).

This change is triggered with issue found in multi-npu platforms
where in docker namespace
net.ipv6.conf.all.forwarding was 0 (should be 1) because of
which RS/RA message were triggered and link-local router were learnt.

Beside this there were some other sysctl.net.ipv6* params whose value
in docker namespace is not same as host namespace.

So to make we are always in sync in host and docker namespace
created common file that list all sysctl.net.* params and used
both by host and docker namespace. Any change will get applied
to both namespace.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments and made sure to invoke augtool
only one and do string concatenation of all set commands

* Address Review Comments.
2020-07-12 18:08:51 +00:00
arlakshm
a8b99f77f3 syslog changes Multi ASIC platforms (#4738)
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-07-12 18:08:51 +00:00
abdosi
15440b6e43
Changes to make default route programming correct in multi-npu platforms (#4774)
* Changes to make default route programming
correct in multi-asic platform where frr is not running
in host namespace. Change is to set correct administrative distance.
Also make NAMESPACE* enviroment variable available for all dockers
so that it can be used when needed.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix review comments

* Review comment to check to add default route
only if default route exist and delete is successful.
2020-06-29 11:38:46 -07:00
SuvarnaMeenakshi
ab2177b4a9
[systemd-generator]: Fix dependency update for multi-asic platform (#4820)
* [systemd-generator]: Fix the code to make sure that dependencies
of host services are generated correctly for multi-asic platforms.
Add code to make sure that systemd timer files are also modified
to add the correct service dependency for multi-asic platforms.

Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>

* [systemd-generator]: Minor fix, remove debug code and
remove unused variable.
2020-06-29 09:39:23 -07:00
Praveen Chaudhary
07930c39ba
[build] Add essential PY PKGs on host for sonic-utilities/config/config_mgmt.py (#4740)
Add essential PY PKGs on host by installing them in sonic_debian_extension.j2

Signed-off-by: Praveen Chaudhary pchaudhary@linkedin.com
2020-06-28 11:03:48 -07:00
Qi Luo
6849a0351c
[redis] Install vanilla redis packages for Buster and Stretch; upgrade Buster to 6.0.5 (#4732)
upgrade redis server to 5:6.0.5-1~bpo10+1
2020-06-27 01:17:20 -07:00
yozhao101
4fa81b4f8d
[dockers] Update critical_processes file syntax (#4831)
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
2020-06-25 21:18:21 -07:00
Qi Luo
719c8e68c8
[secureboot] only remove exec bit in secureboot (#4836)
Address issue #4832
2020-06-25 10:07:50 -07:00
Joe LeVeque
63d2efbe03
[build][systemd] Mask disabled services by default (#4721)
When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.
2020-06-24 15:25:16 -07:00
Samuel Angebault
f7d43173a2 [secureboot] only remove exec bit in secureboot
Address issue #4832
2020-06-23 11:34:07 -07:00
Samuel Angebault
67987e9c0e
[secureboot] Add secureboot support for Arista devices (#4741)
* Add secureboot support in boot0
* Initramfs changes for secureboot on Aboot devices
* Do not compress squashfs and gz in fs.zip
It doesn't make much sense to do so since these files are already
compressed.
Also not compressing the squashfs has the advantage of making it
mountable via a loop device.
* Add loopoffset parameter to initramfs-tools
2020-06-22 09:30:31 -07:00
Kebo Liu
2b568ec136
Add with_i2cdev for mst start to have I2C device loaded properly (#4790) 2020-06-21 16:27:05 +03:00
Joe LeVeque
4d2d95e8e6
[hostcfgd] Synchronize all feature statuses once upon start (#4714)
- Ensure all features (services) are in the configured state when hostcfgd starts
- Better functionalization of code
- Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword.

This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.
2020-06-20 12:09:29 -07:00
padmanarayana
95e3cda5da
[DELL]: FTOS to SONiC fast conversion fixes (#4807)
While migrating to SONiC 20181130, identified a couple of issues:
1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 
2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.
2020-06-19 11:02:08 -07:00
Joe LeVeque
1f8a78cef1
[build] No longer install Python 'click-default-group' package (#4811)
All dependencies upon the Python 'click-default-group' package have been removed from sonic-utilities as of https://github.com/Azure/sonic-utilities/pull/903. The submodule was updated to include this patch as of https://github.com/Azure/sonic-buildimage/pull/4601, therefore we no longer need to install this package in the SONiC image.
2020-06-19 10:54:10 -07:00
Joe LeVeque
6960477cc2
[caclmgrd] Don't limit connection tracking to TCP (#4796)
Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.
2020-06-18 00:18:20 -07:00
abdosi
30d7ce0004
[build] Ensure /usr/lib/systemd/system/ directory exists before referencing (#4788)
* Fix the Build on 201911 (Stretch) where the directory
/usr/lib/systemd/system/ does not exist so creating
manually. Change should not harm Master (buster) where
the directory is created by Linux

* Fix as per review comments
2020-06-17 09:16:58 -07:00
xumia
76a395cdbf
[secure boot] Support rw files allowlist (#4585)
* Support rw files allowlist for Sonic Secure Boot
* Improve the performance
* fix bug
* Move the config description into a md file
* Change to use a simple way to remove the blank line
* Support chmod a-x in rw folder
* Change function name
* Change some unnecessary words
2020-06-13 00:10:13 -07:00
Renuka Manavalan
edeb40ffcf
[k8s]: switching to Flannel from Calico. (#4768)
Switching to Flannel from Calico which brings down the image size by around 500+MB.
2020-06-12 18:06:08 -07:00
Joe LeVeque
4e482c16ba
[build] Enable telemetry service by default (#4760)
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image

**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
2020-06-12 16:20:31 -07:00
Ying Xie
ae7bf3db52
[ntp] disable ntp long jump (#4748)
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-06-11 13:01:21 -07:00
yozhao101
4ea2e5e6dc
[docker-syncd] Add timeout to force stop syncd container (#4617)
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.

164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done

The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:

str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134

The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:

After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution 
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.

**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-06-04 15:17:28 -07:00
Joe LeVeque
7b8037770d
[caclmgrd] Get first VLAN host IP address via next() (#4685)
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.

This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
2020-06-02 02:11:21 -07:00
Joe LeVeque
eff8a89523
[hostcfgd] Get service enable/disable feature working (#4676)
Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here:

1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop
2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots
3. Substitute returns with continues so that even if one service fails, we still try to handle the others

Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.
2020-06-02 02:07:22 -07:00
Joe LeVeque
1e369b0998
[systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673)
This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation
2020-05-30 13:46:44 -07:00
Qi Luo
65e7a84509
[baseimage]: Build and install redis-dump-load Python 3 package in host image (#4661)
Fix #4656
2020-05-30 05:52:27 -07:00
Samuel Angebault
d35a8a3800
[arista]: Add SmartsvilleDDBK and SmartsvilleBkMs (#4662)
Co-authored-by: Boyang Yu <byu@arista.com>
2020-05-28 14:59:00 -07:00
taocy
4cd36175ce arm arch: 1. install required libraries; 2. umount /proc after dockerfs. 2020-05-25 13:15:19 +00:00
taocy
ea2dd9541d change image apt source list from stretch to buster for arm 2020-05-25 13:15:19 +00:00
Praveen Chaudhary
0ccdd70671
[sonic-yang-mgmt]: sonic-yang-mgmt package for configuration validation. (#3861)
**- What I did**

#### wheel package Makefiles

- wheel package Makefiles for sonic-yang-mgmt package.

#### libyang Python APIs:
- python APIs based on libyang
- functions to load/merge yang models and Yang data files
- function to validate data trees based on Yang models
- functions to merge yang data files/trees
- add/set/delete node in schema and data trees
- find data/schema nodes from xpath from the Yang data/schema tree in memory
- find dependencies
- dump the data tree in json/xml

#### Extension of libyang Python APIs:
-- Cropping input config based on Yang Model.
-- Translate input config based on Yang Model.
-- rev Translate input config based on Yang Model.
-- Find xpath of port, portleaf and a yang list.
-- Find if node is key of a list while deletion if yes, then delete the parent.

Signed-off-by: Praveen Chaudhary pchaudhary@linkedin.com
Signed-off-by: Ping Mao pmao@linkedin.com
2020-05-21 16:27:57 -07:00
simonJi2018
0b6253baa1
[platform/nephos] Optimize the code to reduce changes due to the kernel upgrade (#4332)
- bug fix : Fixed an issue which the nps ko file was not loaded due to the wrong service file name
- Optimize the code to reduce changes due to the kernel upgrade
- Remove nephos ko file loaded in swss.service.j2 because it has loaded at syncd.service.j2
2020-05-21 02:21:07 -07:00
anand-kumar-subramanian
34586032dc
[mgmt-framework] removed requires dependency on swss (#4548)
fixes #4473
2020-05-20 20:47:09 -07:00
Joe LeVeque
bce42a7595
[caclmgrd] Allow more ICMP types (#4625) 2020-05-20 17:45:07 -07:00
abdosi
a44fc07e78
Changes to support config-setup service for multi-npu (#4609)
* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using  config db from previous
file system

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Address Review comments

* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.

* Updated to use python command so no code duplication.
2020-05-20 16:32:33 -07:00
rkdevi27
32f58b5864
Fix "/host unmount failure" during reboot (#4558) 2020-05-20 11:18:11 -07:00
Ying Xie
cdfb1ced44
[ntp] enable/disable NTP long jump according to reboot type (#4577)
* [ntp] enable/disable NTP long jump according to reboot type

- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* fix typo

* further refactoring

* use sonic-db-cli instead
2020-05-20 10:57:21 -07:00
rajendra-dendukuri
9c7105b5f3
Install swsssdk-py3 in the base Debian image for python3 based apps (#4542)
Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
2020-05-19 11:15:05 -07:00