- Why I did it
The l2switch.j2 template does not include all fields for PORT. This could be incompatible with the 201911 image or later.
- How I did it
Update l2switch.j2 template and add a unit test.
Fix 259 alerts reported by the LGTM tool:
- 245 for Unused import
- 7 for Testing equality to None
- 5 for Duplicate key in dict literal
- 1 for Module is imported more than once
- 1 for Unused local variable
**- Why I did it**
We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.
**- How I did it**
- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
- Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
* Create new `PEER_SWITCH` table in config DB with info from minigraph
* Add `subtype` field to `DEVICE_METADATA` table and set value to `DualToR` if device is in a dual ToR setup
Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates
Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created
Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
When forced mgmt routes are present, the issue fixed as part of #5754 is not complete.
Added a preference(priority) field to forced mgmt route ip rules
Python 3 is more strict with `__slots__`. As per the [documentation](https://docs.python.org/3/reference/datamodel.html#notes-on-using-slots):
> \_\_slots\_\_ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by \_\_slots\_\_; otherwise, the class attribute would overwrite the descriptor assignment.
This was apparently missed when making sonic-config-engine compliant with Python 3, and errors like the following would be seen:
```
tests/acl_loader_test.py:10: in <module>
from acl_loader.main import *
acl_loader/main.py:8: in <module>
import openconfig_acl
/usr/local/lib/python3.7/dist-packages/openconfig_acl.py:24: in <module>
class yc_state_openconfig_acl__acl_state(PybindBase):
E ValueError: '_pybind_generated_by' in __slots__ conflicts with class variable
```
Take tunnel info from `<TunnelInterface>` tag in minigraph, and create tables in config_DB:
```
"TUNNEL": {
"MUX_TUNNEL_0": {
"tunnel_type": "IPINIP",
"dst_ip": "26.1.1.10",
"dscp_mode": "uniform",
"encap_ecn_mode": "standard",
"ecn_mode": "copy_from_outer",
"ttl_mode": "pipe"
}
}
```
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Fix#5812
LLDP conf Jinja2 Template does not verify IPv4 address and can use IPv6 version. This issue does not effect control LLDP daemon. Issue can be reproduced via `test_snmp_lldp` test. LLDP conf Jinja2 Template selects first item from the list of mgmt interfaces.
TESTBED_1 LLDP conf
```
# cat /etc/lldpd.conf
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern FC00:3::32
configure system hostname dut-1
```
TESTBED_2 LLDP conf
```
# cat /etc/lldpd.conf
configure ports eth0 lldp portidsubtype local eth0
configure system ip management pattern 10.22.24.61
configure system hostname dut-2
```
TESTBED_1 MGMT_INTERFACE
```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|10.22.24.53/23
MGMT_INTERFACE|eth0|FC00:3::32/64
```
TESTBED_2 MGMT_INTERFACE
```
$ redis-cli -n 4 keys "*" | grep MGMT_INTERFACE
MGMT_INTERFACE|eth0|FC00:3::32/64
MGMT_INTERFACE|eth0|10.22.24.61/23
```
Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
FixAzure/SONiC#551
When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template.
This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system.
When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]".
This l3mdev IP rule is never getting deleted even if VRF is deleted.
Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ".
This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551.
More details and possible solutions are explained as comments in the Issue551.
This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address.
Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address.
Co-authored-by: Kannan KVS <kannan_kvs@dell.com>
* Initial commit for BGP internal neighbor table support.
> Add new template named "internal" for the internal BGP sessions
> Add a new table in database "BGP_INTERNAL_NEIGHBOR"
> The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR"
* Changes in template generation tests with the introduction of internal neighbor template files.
* Fix for LLDP advertisments being sent with wrong information.
Since lldpd is starting before lldpmgr, some advertisment packets might sent with default value, mac address as Port ID.
This fix hold the packets from being sent by the lldpd until all interfaces are well configured by the lldpmgrd.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Fix comments
* Fix unit-test output caused a failure during build
* Add 'run_cmd' function and use it
* Resume lldpd even if port init timeout reached
Find LogicalLinks in minigraph and parse the port information. A new field called `mux_cable` is added to each port's entry in the Port table in config DB:
```
PORT|Ethernet0: {
"alias": "Ethernet4/1"
...
"mux_cable": "true"
}
```
If a mux cable is present on a port, the value for `mux_cable` will be `"true"`. If no mux cable is present, the attribute will either be omitted (default behavior) or set to `"false"`.
Issue was because we were relying on port_alias_asic_map dictionary
but that dictionary can't be used as alias name format has changed.
Fix the port alias mapping as what is needed.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
This PR makes two changes:
- Store Jinja2 cache in LOGLEVEL DB instead of STATE DB
- Store bytecode cache encoded in base64
Tested with the following command: "redis-dump -d 3 -k JINJA2_CACHE"
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
**- Why I did it**
FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality.
That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default`
**- How I did it**
Put `ip nht resolve-via-default` into the config
**- How to verify it**
Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
* Install ndppd during image build, and copy config files to image
* Configure proxy settings based on config DB at container start
* Pipe ndppd output to logger inside container to log output in syslog
* Fix generate_l2_config: don't override hostname because sonic-cfggen may not read from Redis. Fix test_l2switch_template test case to test preset l2 feature.
* Improve test script: compare json files with sort_keys
* Revert changes on sample_output
* Remove members field in VLAN section. Fix test assertTrue statement.
Avoiding recursive update of maps as it consumes stack frames. This
PR introduces iterative version of deep_update method.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Rendering templates show different orders when rendered using Python
2 vs Python 3. This PR prepare for Python 3 packaging by creating
new dir 'py2' for Python 2 rendered test cases.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Jinja2 templates rendered using Python 3 interpreter, are required
to conform with Python 3 new semantics.
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Natural sorting of SONiC config gen output consumes lot of CPU cycles.
The sole use of natsorted was to make test comparison easier and so,
the natsorting logic is now relocated to the test suite. As a result
sonic-cfggen gained nearly 1 sec per call since we no longer import
natsorted module!
singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
As per the VOQ HLDs, internal networking between the linecards and supervisor is required within a chassis.
Allocating 127.X/16 subnets for private communication within a chassis is a good candidate.
It doesn't require any external IP allocation as well as ensure that the traffic will not leave the chassis.
References:
https://github.com/Azure/SONiC/pull/622https://github.com/Azure/SONiC/pull/639
**- How I did it**
Changed the `interfaces.j2` file to add `127.0.0.1/16` as the `lo` ip address.
Then once the interface is up, the post-up command removes the `127.0.0.1/8` ip address.
The order in which the netmask change is made matters for `127.0.0.1` to be reachable at all times.
**- How to verify it**
```
root@sonic:~# ip address show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/16 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
```
Co-authored-by: Baptiste Covolato <baptiste@arista.com>
When BGP routes are missing, DHCP packets get relayed over mgmt
interface. This results in dhcpmon alerting that DHCP packets are
not being relayed. This is PR include mgmt interface as uplink
device, and so, if DHCP packet gets relayed over mgmt interface,
regular dhcpmon alert will not be issues. Instead, dhcpmon will
check the mgmt interface counts and issue a separate alert regarding
packets travelling through mgmt network.
In addition, this PR includes the following enhancements:
1. Add SIGUSR1 handler that prints out current packet counts
2. Increase alert grace window to 3 minutes from currently 2 minutes
3. Time is now computed more accurately
4. Print vlan name before counters
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:
```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```
This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.
Resolves https://github.com/Azure/sonic-buildimage/issues/5241
* [sonic-utilities]update submodule with fix
This PR addresses fixes in sonic-py-common to imitate the behavior inside
sonic-cfggen. Essentially this is a fix for accessing the port-config file.
First check if there is a platform.json file for config generation
and then for legacy port_config.ini.
Also updating the sub-module sonic-utilities.
Fix pfcwd stats crash with invalid queue name (#1077)
[show][bgp]Display the Total number of neighbors in the show ip bgp(v6) summary. (#1079)
[config] Update SONiC Environment Vars When Loading Minigraph (#1073)
Multi asic platform changes for interface, portchannel commands (#878)
Update Command-Reference.md (#1075)
[filter-fdb] Fix Filter FDB With IPv6 Present in Config DB (#1059)
[config] Remove _get_breakout_cfg_file_name helper function (#1069)
[SHOW][BGP] support show ip(v6) bgp summary for multi asic platform (#1064)
[fanshow] Display other fan status, such as Updating (#1014)
Add ip_prefix len based on proxy_arp status (#1046)
Enable the platform specific ssd firmware upgrade during reboot (#954)
[show][cli[show interface portchannel support for Multi ASIC (#1005)
support show interface commands for multi ASIC platforms (#1006)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
The following changes are done.
- Multi asic platform have 2 Loopback interfaces, Loopback0 and Loopback4096. IPinIP decap entries need to be added for both of them. Update the ipinip.json.j2 template to add decap entries for Loopback4096.
- Add corressponding unit test
This PR enables cfggen to readr/write from Redis DB using pipelines.
Pipelines enables batch read/write from/to Redis DB.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
- Ignore directories generated by building Python wheel package
- Move all sonic-config-engine ignores from the root .gitignore to src/sonic-config-engine/.gitignore
Argument to write to config-db is not allowed when using template.
This PR allows cfggen to write to redis db when using template
mode.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Add sonic_interface.py in sonic-py-common for sonic interface utilities to keep this SONIC PREFIX naming convention in one place in py-common and all modules/applications use the functions defined here.
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
The following common APIs are added for multi ASIC
- an API to check if a given port is a internal or external port
- an API to check if a given port-channel is internal or external
- an API to check if a bgp-session is internal or external
- an API to connect to the config and other dbs in the a given namespace
- added common APIs to the sonic_py_common library.
- update the sample port-config.ini with role column and add corresponding test to verify the ports configuration is - generated properly.
Calls to cfggen take considerable time. With batch mode, we will have the ability
to reduce number of calls from services.
Example of the batch mode command:
sonic-cfggen -t template-1.j2 -t template-2.j2,config-db -t template-3.j2,config-db -t template-4.j2,file1 -t template-5.j2,file2 --write-to-db.
template-1.j2 will be rendered to stdout since it is missing the dest part. stdout is default
config-db is a special keyword that will inject the rendered template into internal data structure. The internal data structure gets written to redis-db with --write-to-db switch. In the case the user would like to write to a file named config-db, it could be given as /config-db or ./config-db
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
* Bring up FPGA ports and test it
* Bring up those ports in neighbors dict
* Revert delete of a line
* Add test
* change code comment
* Change test name
* Revert submodule update
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module
This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.