The service crash when the platform boots due to missing waits.
/usr/bin/database.sh tries to operate on a missing socket and fails.
We now wait for the chassis database to be ready the same way we do database.
Barefoot platform vendors' sonic_platform packages import the Python 'thrift' library. Previously, our custom-built package was being installed in the PMon container and host OS. However, we are only building a Python 2 version of that package, which was only intended for use with saithrift.
Fixes#6077
**- Why I did it**
Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs.
**- How I did it**
Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.
Made changes so that Lldp docker start using py3 of sonic-db-syncd
submodule update sonic-db-syncd
5cc29a1b32d8d1f4dfbc967bfea2727c50a49c76 (HEAD -> master, origin/master, origin/HEAD) Changes to convert sonic-dbsyncd from python 2 to 3
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
**- Why I did it**
We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.
**- How I did it**
- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
- Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
Fixed TSA bugs:
1. TSA didn't advertise Loopback ipv6 address
2. TSA and TSB changed BGP dynamic and BGP monitors sessions
**- How to verify it**
Build an image and run on your DUT.
```
admin@str-s6100-acs-1:~$ TSA
System Mode: Normal -> Maintenance
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv4 neighbors 10.0.0.1 advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.1.0.32/32 0.0.0.0 0 32768 i
Total number of prefixes 1
admin@str-s6100-acs-1:~$ vtysh -c 'show bgp ipv6 neighbors fc00::a advertised-routes'
BGP table version is 6, local router ID is 10.1.0.32, vrf id 0
Default local pref 100, local AS 64601
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> fc00:1::/64 :: 0 32768 i
Total number of prefixes 1
admin@str-s6100-acs-1:~$ TSB
System Mode: Maintenance -> Normal
```
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
This change introduces PDDF which is described here: https://github.com/Azure/SONiC/pull/536
Most of the platform bring up effort goes in developing the platform device drivers, SONiC platform APIs and validating them. Typically each platform vendor writes their own drivers and platform APIs which is very tailor made to that platform. This involves writing code, building, installing it on the target platform devices and testing. Many of the details of the platform are hard coded into these drivers, from the HW spec. They go through this cycle repetitively till everything works fine, and is validated before upstreaming the code.
PDDF aims to make this platform driver and platform APIs development process much simpler by providing a data driven development framework. This is enabled by:
JSON descriptor files for platform data
Generic data-driven drivers for various devices
Generic SONiC platform APIs
Vendor specific extensions for customisation and extensibility
Signed-off-by: Fuzail Khan <fuzail.khan@broadcom.com>
1. Update SSL ca certificates for secure download [arm specific]
2. Using redis-tools from blob sonic-storage for docker-base-stretch
Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates
Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created
Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
*Why I did it
Provide access to the ixnetwork-open-traffic-generator pypi package
*How I did it
Added a pip install entry for the package in the Dockerfile.j2
*How to verify it
pip show ixnetwork-open-traffic-generator
*Description for the changelog
Install of ixnetwork-open-traffic-generator pypi package is required for proposed rdma tests.
- Why I did it
Update the routine is_bgp_session_internal() by checking the BGP_INTERNAL_NEIGHBOR table.
Additionally to address the review comment #5520 (comment)
Add timer settings as will in the internal session templates and keep it minimal as these sessions which will always be up.
Updates to the internal tests data + add all of it to template tests.
- How I did it
Updated the APIs and the template files.
- How to verify it
Verified the internal BGP sessions are displayed correctly with show commands with this API is_bgp_session_internal()
lldpmgrd, bgpcfgd, and bgpmon are reported error status not running due
to recent change of shebang to use `Python3`. Modifying the argument of
`process_checker` to follow this change.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
* Fix some spelling error and add addition check in port_index_mapper.py script.
* Remove unneeded pyroute2 python module from sFlow container
Signed-off-by: Garrick He <garrick_he@dell.com>
Fix#5812
LLDP conf Jinja2 Template does not verify IPv4 address and can use IPv6 version. This issue does not effect control LLDP daemon. Issue can be reproduced via `test_snmp_lldp` test. LLDP conf Jinja2 Template selects first item from the list of mgmt interfaces.
TESTBED_1 LLDP conf
```
# cat /etc/lldpd.conf
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern FC00:3::32
configure system hostname dut-1
```
TESTBED_2 LLDP conf
```
# cat /etc/lldpd.conf
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern 10.22.24.61
configure system hostname dut-2
```
TESTBED_1 MGMT_INTERFACE
```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|10.22.24.53/23
MGMT_INTERFACE|eth0|FC00:3::32/64
```
TESTBED_2 MGMT_INTERFACE
```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|FC00:3::32/64
MGMT_INTERFACE|eth0|10.22.24.61/23
```
Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
- Convert restore_nat_entries.py script to Python 3
- Use logger from sonic-py-common for uniform logging
- Reorganize imports alphabetically per PEP8 standard
- Two blank lines precede functions per PEP8 standard
**- Why I did it**
A memory issue was discovered during system test for scaling. The issue is documented here: https://docs.pyroute2.org/ipdb.html
> One of the major issues with IPDB is its memory footprint. It proved not to be suitable for environments with thousands of routes or neighbours. Being a design issue, it could not be fixed, so a new module was started, NDB, that aims to replace IPDB. IPDB is still more feature rich, but NDB is already more fast and stable.
**- How I did it**
- Rewrote the port_index_mapper.py script to use dB events.
- Convert to Python 3
**- Why I did it**
As part of moving all SONiC code from Python 2 (no longer supported) to Python 3
**- How I did it**
- Convert enable_counters.py script to Python 3
- Reorganize imports per PEP8 standard
- Two blank lines precede functions per PEP8 standard
* Convert bgpcfgd to python3
Convert bgpmon to python3
Fix some issues in bgpmon
* Add python3-swsscommon as depends
* Install dependencies
* reorder deps
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
Recent changes to dependencies caused the 'enum34' package to cease being installed for Python 2 in the PMon container. This broke Arista platforms, where the Arista sonic_platform package imports 'enum'. This is because on Arista devices, the sonic_platform wheel is not installed in the container. Instead, the installation directory is mounted from the host OS. However, this method doesn't ensure all dependencies are installed in the container.
Prevent intermittent build failures when building Sonic for the ARM platform architecture due to version upgrades of the redis-tools and redis-server packages.
Modify select Dockerfile templates to download the redis-tools and redis-server packages from sonicstorage rather than from debian.org.
This PR has been made possible by the inclusion of ARM versions of redis-tools and redis-server into sonicstorage as described in Issue# 5701
* This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.
On Arista platforms, sonic_platform packages are not installed in the PMon container, but are rather mounted into the container from the host OS. Therefore, pip show sonic_platform will fail in the PMon container. This change will first check if we can import sonic_platform. If this fails, it will then fall back to checking if the package is installed. If both fail, it will attempt to install the package.
Why/How I did:
Make sure first error syslog is triggered based on FAULT TOLERANCE condition.
Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent
Updated the monit conf files with repeat every x cycles for the alert action
Increase startretires value from default of 10 to 50 to prevent supervisor from placing thermalctld in FATAL state during regression testing. Also ensures supervisord tries hard to get thermalctld running in production, as thermalctld is critical to prevent device from overheating.
As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias.
Also some other pip-related cleanup
Upgrading pip3 after pip2 caused the pip command to be aliased to the pip3 command. However, since we are still transitioning from Python 2 to Python 3, most pip commands in the codebase are expecting pip to alias to pip2. The proper solution here is to explicitly call pip2 and pip3, and no longer call pip, however this will require extensive changes and testing, so to quickly fix this issue, we upgraded pip2 after pip3 to ensure that pip2 is installed after pip3.
* Initial commit for BGP internal neighbor table support.
> Add new template named "internal" for the internal BGP sessions
> Add a new table in database "BGP_INTERNAL_NEIGHBOR"
> The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"
* Changes in template generation tests with the introduction of internal neighbor template files.
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Fix comments
* Fix unit-test output caused a failure during build
* Add 'run_cmd' function and use it
* Resume lldpd even if port init timeout reached
The orchagent and syncd need to have the same default synchronous mode configuration. This PR adds a template file to translate the default value in CONFIG_DB (empty field) to an explicit mode so that the orchagent and syncd could have the same default mode.
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry
**- How I did it**
I introduced a new CONFIG_DB entry:
- table name: "BGP_BBR"
- key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
- data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"
Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).
bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).
**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
neighbor PEER_V4 allowas-in 1
neighbor PEER_V6 allowas-in 1
```
Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json
{
"BGP_BBR": {
"all": {
"status": "disabled"
}
}
}
admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```
Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
admin@str-s6100-acs-1:~$
```
Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json
{
"BGP_BBR": {
"all": {
"status": "enabled"
}
}
}
admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```
Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
neighbor PEER_V4 allowas-in 1
neighbor PEER_V6 allowas-in 1
```
*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name Stmts Miss Cover
----------------------------------------------------
bgpcfgd/__init__.py 0 0 100%
bgpcfgd/__main__.py 3 3 0%
bgpcfgd/config.py 78 41 47%
bgpcfgd/directory.py 63 34 46%
bgpcfgd/log.py 15 3 80%
bgpcfgd/main.py 51 51 0%
bgpcfgd/manager.py 41 23 44%
bgpcfgd/managers_allow_list.py 385 21 95%
bgpcfgd/managers_bbr.py 76 0 100%
bgpcfgd/managers_bgp.py 193 193 0%
bgpcfgd/managers_db.py 9 9 0%
bgpcfgd/managers_intf.py 33 33 0%
bgpcfgd/managers_setsrc.py 45 45 0%
bgpcfgd/runner.py 39 39 0%
bgpcfgd/template.py 64 11 83%
bgpcfgd/utils.py 32 24 25%
bgpcfgd/vars.py 1 0 100%
----------------------------------------------------
TOTAL 1128 530 53%
```
**- Which release branch to backport (provide reason below if selected)**
- [ ] 201811
- [x] 201911
- [x] 202006
use correct chassisdb.conf path while bringing up chassis_db service on VoQ modular switch.chassis_db service on VoQ modular switch.
resolves#5631
Signed-off-by: Honggang Xu <hxu@arista.com>
There is currently a bug where messages from swss with priority lower than the current log level are still being counted against the syslog rate limiting threshhold. This leads to rate-limiting in syslog when the rate-limiting conditions have not been met, which causes several sonic-mgmt tests to fail since they are dependent on LogAnalyzer. It also omits potentially useful information from the syslog. Only rate-limiting messages of level INFO and lower allows these tests to pass successfully.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
We were building our own python-click package because we needed features/bug fixes available as of version 7.0.0, but the most recent version available from Debian was in the 6.x range.
"Click" is needed for building/testing and installing sonic-utilities. Now that we are building sonic-utilities as a wheel, with Click specified as a dependency in the setup.py file, setuptools will install a more recent version of Click in the sonic-slave-buster container when building the package, and pip will install a more recent version of Click in the host OS of SONiC when installing the sonic-utilities package. Also, we don't need to worry about installing the Python 2 or 3 version of the package, as the proper one will be installed as necessary.
**- Why I did it**
FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality.
That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default`
**- How I did it**
Put `ip nht resolve-via-default` into the config
**- How to verify it**
Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>