**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.
**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service
**- How to verify it**
Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs
1. Stop teamd docker alone
> swss, syncd dockers seen going away
> The LAG reference count error messages seen for a while till swss docker stops.
> Dockers back up.
2. Enable WR mode for teamd. Stop teamd docker alone
> swss, syncd dockers not removed.
> The LAG reference count error messages not seen
> Repeated stop teamd docker test - same result, no effect on swss/syncd.
3. Stop swss docker.
> swss, teamd, syncd goes off - dockers comes back correctly, interfaces up
4. Enable WR mode for swss . Stop swss docker
> swss goes off not affecting syncd/teamd dockers.
5. Config reload
> no reference counter error seen, dockers comes back correctly, with interfaces up
6. Warm reboot, observations below
> swss docker goes off first
> teamd + syncd goes off to the end of WR process.
> dockers comes back up fine.
> ping traffic between VM's was NOT HIT
7. Fast reboot, observations below
> teamd goes off first ( **confirmed swss don't exit here** )
> swss goes off next
> syncd goes away at the end of the FR process
> dockers comes back up fine.
> there is a traffic HIT as per fast-reboot
8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
**- Why I did it**
If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service.
The snmp.timer service will first stop snmp.service and later start snmp.service.
In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was
updated.
**- How I did it**
When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be
cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a
container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached
value will also be updated. Otherwise, nothing will be done.
**- How to verify it**
We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Now we are reading base mac, product name from eeprom data, and the data read from eeprom contains multiple "\0" characters at the end, need trim them to make the string clean and display correct.
Issue was because we were relying on port_alias_asic_map dictionary
but that dictionary can't be used as alias name format has changed.
Fix the port alias mapping as what is needed.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied.
This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.
With python 2.7, import yaml module was resulting in huge memory allocation in the heap per process. As an interim fix, moving the import yaml to the function which actually uses this module. This helps reduce the memory footprint of pmon docker, as it don't use the API's which need yaml processing.
This issue not seen with importing yaml with python3, Need to be further analyzed, hence putting this fix in 201911 where we continue to use python2.7.
* Optimze ACL Table/Rule notifcation handling
to loop pop() until empty to consume all the data in a batch
This wau we prevent multiple call to iptable updates
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Address review comments
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
**- Why I did it**
FRR introduced [next hop tracking](http://docs.frrouting.org/projects/dev-guide/en/latest/next-hop-tracking.html) functionality.
That functionality requires resolving BGP neighbors before setting BGP connection (or explicit ebgp-multihop command). Sometimes (BGP MONITORS) our neighbors are not directly connected and sessions are IBGP. In this case current configuration prevents FRR to establish BGP connections. Reason would be "waiting for NHT". To fix that we need either add static routes for each not-directly connected ibgp neighbor, or enable command `ip nht resolve-via-default`
**- How I did it**
Put `ip nht resolve-via-default` into the config
**- How to verify it**
Build an image. Enable BGP_MONITOR entry and check that entry is Established or Connecting in FRR
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Calculate ECMP hash seed based on ASIC ID on multi ASIC platform. Each ASIC will have a unique ECMP hash seed value.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
- SN3800 vs Cisco9236 - no link copper or optics - start sending IDLE before PHY_UP for specific OPNs
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
any event happening on ACL Rule Table (eg DATAACL rules
programmed) caused control plane default action to be triggered.
Now Control Plance ACTION will be trigger only
a) ACL Rule beloging to Control ACL Table
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
**- Why I did it**
- Platform API implementation using sonic-cfggen to get platform name and SKU name, which will fail when the database is not available.
- Chassis name is not correctly assigned, it shall be assigned with EEPROM TLV "Product Name", instead of SKU name
- Chassis model is not implemented, it shall be assigned with EEPROM TLV "Part Number"
**- How I did it**
1. Chassis
> - Get platform name from /host/machine.conf
> - Remove get SKU name with sonic-cfggen
> - Get Chassis name and model from EEPROM TLV "Product Name" and "Part Number"
> - Add function to return model
2. EEPROM
> - Add function to return product name and part number
3. Platform
> - Init EEPROM on the host side, so also can get the Chassis name model from EEPROM on the host side.
**- Why I did it**
BGP_MONITORS sessions don't have corresponding DEVICE_NEIGHBOR_METADATA CONFIG_DB entries in the minigraphs. Prevent bgpcfgd to wait on such entries for BGP_MONITORS sessions.
**- How I did it**
Set constructor argument to False that means - don't wait for device neighbors metadata info for BGP_MONITORS
**- How to verify it**
Build an image, write on your device, use a minigraph with BGP_MONITORS sessions. Check that sessions are populated in the config.
implements a new feature: "BGP Allow list."
This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.
To address issue #5525
Explicitly control the grub installation requirement when it is needed.
We have scenario where configuration migration happened but grub
installation is not required.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Fix generate_l2_config: don't override hostname because sonic-cfggen may not read
from Redis. Fix test_l2switch_template test case to test preset l2
feature
* Improve test script: compare json files with sort_keys
* Revert changes on sample_output
* Remove members field in VLAN section. Fix test assertTrue statement.
Refactor SFP reset, low power get/set API, and plugins with new SDK SX APIs. Previously they were calling SDK SXD APIs which have glibc dependency because of shared memory usage.
Remove implementation "set_power_override", "tx_disable_channel", "tx_disable" which using SXD APIs, once related SDK SX API available, will add them back based on new SDK SX APIs.
* Support for multi-asic platform for swssloglevel command
admin@str-acs-1:~$ swssloglevel
Usage: /usr/bin/swssloglevel -n [0 to 3] [OPTION]...
* Update to use the env file to get the PLATFORM string.