sonic-buildimage

Archived

Author	SHA1	Message	Date
Tamer Ahmed	b5bf5e3bce	[interfaces] Reduce Calls to SONiC Cfggen (#5174 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when running interfaces- config. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Tamer Ahmed	fae4c4bfcc	[swss] Enhance ARP Update to Call Sonic Cfggen Once (#5398 ) This PR limited the number of calls to sonic-cfggen to one call per iteration instead of current 3 calls per iteration. The PR also installs jq on host for future scripts if needed. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-12-22 09:51:54 -08:00
Samuel Angebault	e75c15bfda	[aboot]: Better handle tmpfs management in boot0 (#6268 ) To limit IO and space usage on the flash device the boot0 script makes sure the SWI is in memory. Because SONiC maps /tmp on the flash, some logic is required to make sure of it. However it is possible for some provisioning mechanism to already download the swi in a memory file system. This was not properly handled by the boot0 script. It now properly detect if the image is on a tmpfs or a ramfs and keep it there if that is the case. - How I did it Check the filesystem on which the SWI pointed by swipath lies. If this filesystem is a ramfs or a tmpfs the move_swi_to_tmpfs becomes a no-op. Made sure the cleanup logic would not behave unexpectedly. - How to verify it In SONiC: Download the swi under /tmp and makes sure it gets moved to /tmp/tmp-swi which gets mounted for that purpose. Make sure /tmp/tmp-swi gets unmounted once the install process is done. Create a new mountpoint under /ram using either ramfs or tmpfs and download the swi there. Install the swi using sonic-installer and makes sure the image doesn't get moved by looking at the logs.	2020-12-22 00:07:10 -08:00
abdosi	35fc12c373	Telemetry Certificate Copy Across Image Upgrade. (#6252 ) To copy telemetry certificate during image upgrade from previous image to new image	2020-12-19 08:24:41 -08:00
arlakshm	7f76698b7d	[201911][hostcfgd]:wait updating the feature table till system init is done (#6234 ) - Why I did it The change is done to make sure the system initialization is done before the hostcfgd sets the feature states. - How I did it This is port of the PR #6232. Since the systemctl version in 201911 doesn't support "--wait". Added a function to check the output of systemctl is-system-running every second, till the command system is done booting up. For now this change is only applicable to multi asic platforms based on the testing this change will be extended to all platforms in the future PR. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-12-18 12:31:35 -08:00
Lawrence Lee	dae5c4c930	[build_templates]: Start SNMP timer after SWSS service (#6195 ) Fixes #5663 - Why I did it It's currently possible for the SNMP timer to conflict with config reload (specifically if the timer triggers while config reload is stopping the SWSS service). config reload triggers SWSS to shutdown, which causes SNMP to shutdown, which conflicts with the SNMP timer causing SNMP to startup. See the linked issue for more details. - How I did it Including the After ordering dependency forces the SNMP timer to wait until SWSS finishes stopping, preventing the conflict. If there is an ordering dependency between two units (e.g. one unit is ordered After another), if one unit is shutting down while the other is starting up, the shutdown will always be ordered before the startup. In this case, that means that the SNMP timer is forced to wait for the SWSS shutdown to complete. Only then can the SNMP timer proceed. See here for more details. It's important to note that the After dependency will not cause SWSS to be started when the SNMP timer fires (assuming that SWSS has not yet been started). The existing Requisite dependency in the SNMP service will also not cause SWSS to be started, instead it will cause the SNMP service to fail if SWSS is not active. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-12-16 19:28:31 -08:00
shlomibitton	959035c854	[NVMe] Add NVMe SSD disc type support to installer.sh script (#6142 ) In order to install a SONiC image on top of a NVMe SSD disc properly with ONIE we must configure it properly on the installer.sh script. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2020-12-16 14:19:07 -08:00
abdosi	3a24e7f31f	[multi-asic] Enhancing monit process checker for multi-asic. (#6100 ) Added Support of process checker for work on multi-asic platforms.	2020-12-04 13:17:35 -08:00
abdosi	1d1898d8e2	Enhanced Feature table to support 'always_enabled' value for state and auto-restart fields. (#6000 ) Added new flag value 'always_enabled' for the state and auto-restart field of feature table init_cfg.json is updated to initialize state field of database/swss/syncd/teamd feature and auto-restart field of database feature as always_enabled Once the state/auto-restart value is initialized as "always_enabled" it is immutable and cannot be change via feature config commands. (config feature..) PR#Azure/sonic-utilities#1271 hostcfgd will not take any action if state field value is 'always_enabled' Since we have always_enabled field for auto-restart updated supervisor-proc-exit-listener not to have special check for database and always rely on value from Feature table.	2020-11-25 10:04:42 -08:00
Rajkumar-Marvell	17045f42d1	Set sock rx Buf size to 3MB. (#5566 ) * Set sock rx Buf size to 3MB.	2020-11-24 11:21:56 -08:00
Prince Sunny	1c2c30fccd	Set preference for forced mgmt routes (#5844 ) When forced mgmt routes are present, the issue fixed as part of #5754 is not complete. Added a preference(priority) field to forced mgmt route ip rules	2020-11-21 09:27:09 -08:00
pavel-shirshov	5f5ec04dda	[bgpcfgd]: Fixes for BBR (#5956 ) * Add explicit default state into the constants.yml * Enable/disable only peer-groups, available in the config * Retrieve updates from frr before using configuration Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-19 10:42:42 -08:00
Lawrence Lee	2aa827f5b7	[buffers_config.j2]: Use correct cable lengths for backend devices (#5905 ) * Remove 'backend' from device type strings so that backend devices ('BackEndToRRouter' and 'BackEndLeafRouter') are given the same cable lengths as regular device types. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-14 08:41:14 -08:00
Lawrence Lee	cb32b362f5	Make backend device checking more robust (#5730 ) Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-14 08:39:08 -08:00
madhanmellanox	a79c3c219d	[201911][caclmgrd] Accomadating case insensitive rule props for Control plane ACLs (#5918 ) To make Control plane ACLs handle case insensitive ACL rules. Currently, it handles only upper case ACL rules. Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-11-13 11:41:05 -08:00
judyjoseph	ce86621399	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-11-10 12:52:58 -08:00
arlakshm	431f97d11d	Add the vtysh command with newly added "-n" option for multi asic to the read_only_cmds (#5845 ) In multi asic platforms the "show ip bgp summary" commands is not available for user with read only privileges, so to fix this the vtysh command with the new "-n" option, added for multi asic platforms, needs to be added to the READ_ONLY_COMMANDS list in the sudoers files. Added the command vtysh -n [0-9] -c show * to list of READ_ONLY_COMMANDS in the sudoers files in this commit. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-11-10 12:30:32 -08:00
abdosi	c453381aec	[multi-asic] Fixed the docker mount point check for multi-asic (#5848 ) API getMount() API was not updated to handle multi-asic platforms Updated API getMount() to return abspath() for Docker Mount Point and use that one for mount point comparison Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-11-09 13:03:20 -08:00
Stepan Blyshchak	dc68576bab	[hostcfgd] If feature state entry not in the cache, add a default state (#5777 ) Our use case is to register new features in runtime. The previous change which introduced the cache broke this capability and caused hostcfgd crash. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2020-11-09 12:34:42 -08:00
Joe LeVeque	4dbc6da85b	[201911][core_cleanup.py] Fix core file path; Improve code reuse (#5782 ) - Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` - Remove locally-define logging functions; use Logger class from sonic-py-common instead	2020-11-03 11:15:49 -08:00
lguohan	339d2aa6c8	[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 ) Fix Azure/SONiC#551 When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]". This l3mdev IP rule is never getting deleted even if VRF is deleted. Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ". This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551. More details and possible solutions are explained as comments in the Issue551. This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address. Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address. Co-authored-by: Kannan KVS <kannan_kvs@dell.com>	2020-11-01 10:41:44 -08:00
Abhishek Dosi	65cb10714c	Revert "[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 )" This reverts commit `28366cd0ce`.	2020-11-01 10:37:16 -08:00
Renuka Manavalan	ecd10b9d10	Load config after subscribe (#5740 ) - Why I did it The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order. - How I did it Do a load after after updating all feature states. - How to verify it Not a easy one Have a script that restart hostcfgd sleep 2s run redis-cli/config command to update AAA/TACACS table Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute. - When it repro: The updates will not reflect in /etc/pam.d/common-auth-sonic	2020-11-01 10:27:10 -08:00
abdosi	0fad6bdc7f	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-11-01 10:27:10 -08:00
lguohan	28366cd0ce	[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 ) Fix Azure/SONiC#551 When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]". This l3mdev IP rule is never getting deleted even if VRF is deleted. Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ". This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551. More details and possible solutions are explained as comments in the Issue551. This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address. Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address. Co-authored-by: Kannan KVS <kannan_kvs@dell.com>	2020-11-01 10:27:10 -08:00
pavel-shirshov	2eec3b3254	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-30 08:58:27 -07:00
pavel-shirshov	bee6c87f90	[bgpcfgd]: Change prefix-list generation for "Allow prefix" feature (#5639 ) - Why I did it I was asked to change "Allow list" prefix-list generation rule. Previously we generated the rules using following method: ``` For each {prefix}/{masklen} we would generate the prefix-rule permit {prefix}/{masklen} ge {masklen}+1 Example: Prefix 1.2.3.4/24 would have following prefix-list entry generated permit 1.2.3.4/24 ge 23 ``` But we discovered the old rule doesn't work for all cases we have. So we introduced the new rule: ``` For ipv4 entry, For mask < 32 , we will add ‘le 32’ to cover all prefix masks to be sent by T0 For mask =32 , we will not add any ‘le mask’ For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0 For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0 For mask = 128 , we will not add any ‘le mask’ ``` - How I did it I change prefix-list entry generation function. Also I introduced a test for the changed function. - How to verify it 1. Build an image and put it on your dut. 2. Create a file test_schema.conf with the test configuration ``` { "BGP_ALLOWED_PREFIXES": { "DEPLOYMENT_ID\|0\|1010:1010": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] }, "DEPLOYMENT_ID\|0": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] } } } ``` 3. Apply the configuration by command ``` sonic-cfggen -j test_schema.conf --write-to-db ``` 4. Check that your bgp configuration has following prefix-list entries: ``` admin@str-s6100-acs-1:~$ show runningconfiguration bgp \| grep PL_ALLOW ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128 ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-10-30 08:56:52 -07:00
bingwang-ms	7a015eacc1	Fix 'NoSuchProcess' exception in process_checker (#5716 ) The psutil library used in process_checker create a cache for each process when calling process_iter. So, there is some possibility that one process exists when calling process_iter, but not exists when calling cmdline, which will raise a NoSuchProcess exception. This commit fix the issue. Signed-off-by: bingwang <bingwang@microsoft.com>	2020-10-30 08:56:10 -07:00
judyjoseph	5a802533b5	Fix to remove the import of APIClient (#5724 )	2020-10-27 08:32:37 -07:00
judyjoseph	963bd7fdc4	[docker-teamd]: Add teamd as a depedent service to swss (#5628 ) - Why I did it On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. - How I did it Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service - How to verify it Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( confirmed swss don't exit here ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios	2020-10-23 15:49:23 -07:00
yozhao101	d8ae2a0019	[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689 ) - Why I did it If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service. The snmp.timer service will first stop snmp.service and later start snmp.service. In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was updated. - How I did it When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached value will also be updated. Otherwise, nothing will be done. - How to verify it We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-10-23 15:45:04 -07:00
Joe LeVeque	4dde7d00cf	[caclmgrd] Prevent unnecessary iptables updates (#5312 ) When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied. This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.	2020-10-21 12:15:04 -07:00
abdosi	c9e0b06009	Optimze ACL Table/Rule notification handling (#5621 ) * Optimze ACL Table/Rule notifcation handling to loop pop() until empty to consume all the data in a batch This wau we prevent multiple call to iptable updates Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address review comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-14 08:08:23 -07:00
abdosi	ccebd006b5	Optimized caclmgrd Notification handling. Previously (#5560 ) any event happening on ACL Rule Table (eg DATAACL rules programmed) caused control plane default action to be triggered. Now Control Plance ACTION will be trigger only a) ACL Rule beloging to Control ACL Table Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-08 11:57:04 -07:00
pavel-shirshov	437ad95646	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-06 11:15:19 -07:00
Ying Xie	bea968bb2b	[rc.local] separate configuration migration and grub installation logic (#5528 ) To address issue #5525 Explicitly control the grub installation requirement when it is needed. We have scenario where configuration migration happened but grub installation is not required. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-10-04 19:41:50 +00:00
Abhishek Dosi	04725bc030	Revert "[bgp] Add 'allow list' manager feature (#5309 )" This reverts commit `b5d33b39de`.	2020-09-29 15:39:04 +00:00
Tamer Ahmed	2cc98b4bac	[platform] Add Support For Environment Variable File (#5010 ) * [platform] Add Support For Environment Variable This PR adds the ability to read environment file from /etc/sonic. the file contains immutable SONiC config attributes such as platform, hwsku, version, device_type. The aim is to minimize calls being made into sonic-cfggen during boot time. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-28 21:14:39 +00:00
pavel-shirshov	b5d33b39de	[bgp] Add 'allow list' manager feature (#5309 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-09-28 16:20:27 +00:00
bingwang-ms	0fabc906d1	Fix exception when attempting to write a datetime to db (#5467 ) redis-py 3.0 used in master branch only accepts user data as bytes, strings or numbers (ints, longs and floats). Attempting to specify a key or a value as any other type will raise a DataError exception. This PR address the issue bt converting datetime to str	2020-09-28 16:18:24 +00:00
judyjoseph	cff716f7a5	[Multi-Asic] Forward SNMP requests received on front panel interface to SNMP agent in host. (#5420 ) * [Multi-Asic] Forward SNMP requests destined to loopback IP, and coming in through the front panel interface present in the network namespace, to SNMP agent running in the linux host. * Updates based on comments * Further updates in docker_image_ctl.j2 and caclmgrd * Change the variable for net config file. * Updated the comments in the code. * No need to clean up the exising NAT rules if present, which could be created by some other process. * Delete our rule first and add it back, to take care of caclmgrd restart. Another benefit is that we delete only our rules, rather than earlier approach of "iptables -F" which cleans up all rules. * Keeping the original logic to clean the NAT entries, to revist when NAT feature added in namespace. * Missing updates to log_info call.	2020-09-28 16:14:07 +00:00
abdosi	615086ee19	Fix the build issue when port2cable lenth define in (#5437 ) buffer_default_*.j2 because of which internal cable length never gets define and cause failure in test case test_multinpu_cfggen.py Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-28 16:04:19 +00:00
yozhao101	7580c846ad	[201911][Monit] Unmonitor processes in disabled containers (#5462 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. - Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:30:41 -07:00
abdosi	73bd647e44	Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358 ) * Enhanced Feature Table state enable/disbale for multi-asic platforms. In Multi-asic for some features we can service per asic so we need to get list of all services. Also updated logic to return if any one of systemctl command return failure and make sure syslog of feature getting enable/disable only come when all commads are sucessful. Moved the service list get api from sonic-util to sonic-py-common Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Make sure to retun None for both service list in case of error. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Return empty list as fail condition Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Address Review Comments. Made init_cfg.json.j2 knowledegable of Feature service is global scope or per asic scope Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Fix merge conflict * Address Review Comment. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-22 11:38:19 -07:00
abdosi	54f47b1bb8	Enabling ipv6 support on docker container network. This is needed (#5418 ) for ipv6 communication between container and host in multi-asic platforms. Address is assign is private address space of fd::/80 with prefix len selected as 80 so that last 48 bits can be container mac address and and you prevent NDP neighbor cache invalidation issues in the Docker layer. Ref: https://docs.docker.com/config/daemon/ipv6/ Ref:https://medium.com/@skleeschulte/how-to-enable-ipv6-for-docker-containers-on-ubuntu-18-04-c68394a219a2 Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-22 11:38:19 -07:00
Renuka Manavalan	7d6e5083ce	[monit] Periodically monitor route consistency (#5085 ) * Add route_check to mont. * Switched to units of cycles per comments * Added comments per Joe's comments. * Added more comments per Royal's comments.	2020-09-19 15:47:53 -07:00
Blueve	64e04f8542	[conf] append nos-config-part for s6100 (#5234 ) * [conf] append nos-config-part for s6100 * modify rc.local Signed-off-by: Guohan Lu <lguohan@gmail.com> * Update rc.local Co-authored-by: Blueve <jika@microsoft.com> Co-authored-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>	2020-09-19 14:14:32 -07:00
noaOrMlnx	d4f6e080cb	Change update_feature_state call to pass False as default if feature has no 'has_timer' field (#5260 ) * Pass False as default if feature has no timer field * Update hostcfgd to fit the new changes merged New changes can be found in PR:5248	2020-09-19 14:07:53 -07:00
abdosi	e43521ab64	[Multi-Asic] Fix for multi-asic where we should allow docker local (#5364 ) communication on docker eth0 ip . Without this TCP Connection to Redis does not happen in namespace. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-19 14:04:56 -07:00
Joe LeVeque	05e5807b3f	[process-reboot-cause] Use Logger class from sonic-py-common package (#5384 ) Eliminate duplicate logging code by importing Logger class from sonic-py-common package.	2020-09-19 13:59:59 -07:00

1 2 3 4 5 ...

637 Commits