sonic-buildimage

Author	SHA1	Message	Date
jmmikkel	43342b33b8	[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622 ) This commit has following changes: * Add templates and code to support VoQ chassis iBGP peers * Add support to convert a new VoQChassisInternal element in the BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR table in CONFIG_DB. * Add a new set of "voq_chassis" templates to docker-fpm-frr * Add a new BGP peer manager to bgpcfgd to add neighbors from the BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates. * Add a test case for minigraph.py, making sure the VoQChassisInternal element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its value is "false". * Add a set of test cases for the new voq_chassis templates in sonic-bgpcfgd tests. Note that the templates expect the new "bgp bestpath peer-type multipath-relax" bgpd configuration to be available. Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>	2021-04-16 11:11:32 -07:00
yozhao101	2737c9681f	[container_checker] Exclude the 'always_disabled' container from expected running container list (#7217 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Since we introduced a new value always_disabled for the state field in FEATURE table, the expected running container list should exclude the always_diabled containers. This bug was found by nightly test and posted at here: issue. This PR fixes #7210. How I did it I added a logic condition to decide whether the value of state field of a container was always_disabled or not. How to verify it I verified this on the device str-dx010-acs-1. Which release branch to backport (provide reason below if selected) 201811 201911 202006 [ x] 202012	2021-04-02 08:05:46 -07:00
Stepan Blyshchak	e179ec2fae	[services] introduce sonic.target (#5705 ) - Why I did it Group all SONiC services together and able to manage them together. Will be used in config reload command as much simpler and generic way to restart services. - How I did it Add services to sonic.target - How to verify it Together with Azure/sonic-utilities#1199 config reload -y Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-02-25 14:26:24 +02:00
arlakshm	f77157f09d	[baseimage] add ipintutil in sudoer file (#6845 ) show ip interfaces is enhanced recently to support multi ASIC platforms in this PR- https://github.com/Azure/sonic-utilities/pull/1396 . The ipintutil script as to run as sudo user, to get the ip interface from each namespace. Add this script to the sudoer file so that show ip interface command is available for user with read-only permissions Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-02-22 23:34:28 -08:00
Sujin Kang	d5238ae8dd	[pcie.yaml] Move pcie configuration file path to platform directory (#6475 ) - Why I did it The pcie configuration file location is under plugin directory not under platform directory. #6437 - How I did it Move all pcie.yaml configuration file from plugin to platform directory. Remove unnecessary timer to start pcie-check.service Move pcie-check.service to sonic-host-services - How to verify it Verify on the device	2021-02-21 08:27:37 -08:00
SuvarnaMeenakshi	5a49a0f499	[multi-asic][vs]: Update topology script to retrieve hwsku from minigraph (#6219 ) Update topology script to retrieve hwsku from minigraph if hwsku information is not available in config_db. Fix clean up of interfaces in msft_multi_asic_vs hwsku topology script. - Why I did it When bringing up multi-asic VS switch, topology service is started during boot up. Topology service starts a shell script which runs the topology script present in /usr/share/sonic/device// directory. To invoke hwsku specific script, the topology script tries to retrieve hwsku information from config_db. During initial boot up config_db might not be populated. In order to start topology service before config_db is updated, update topology script to get hwsku information from minigraph.xml if it is available. This will be helpful to bring up multi-asic VS testbed by loading minigraph and starting topology service. - How I did it Update topology.sh script to retrieve hwsku information from minigraph.xml. Fix clean up function on msft_multi_asic_vs toplogy script. - How to verify it single-asic VS - no change; topology service is only enabled for multi-asic VS. multi-asic VS - Bring up multi-asic VS image, copy minigraph to vs image, start topology service. Topology service should be successful. to test clean up function fix, start topology service - make sure interfaces are created and moved to the right namespaces. stop topology service - make sure namespace do not have any interface and all front end interfaces are present in default namespace.	2021-02-18 22:02:29 -08:00
Joe LeVeque	820d350301	[pcie-check] Update underlying pcieutil command and add to sudoers file (#6682 ) - Why I did it As of Azure/sonic-utilities#1297, subcommands of pcieutil have changed to remove the redundant pcie- prefix. This PR adapts calling applications (pcie-check) to the new syntax. Resolves #6676 - How I did it Remove pcie- prefix from pcieutil subcommands in calling applications Also add pcieutil * to sudoers file, as pcieutil requires elevated permissions	2021-02-04 12:14:08 -08:00
Samuel Angebault	0c4d4ace76	[kdump] Fix OOM events in crashkernel (#6447 ) A few issues where discovered with crashkernel on Arista platforms. 1) platforms using `docker_inram=on` would end up OOM in kdump environment. This happens because the same initramfs is used by SONiC and the crashkernel. With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion. Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM. This OOM event can in turn trigger a panic. 2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot. This happens because the `boot0` in secureboot mode is strict about kernel parameter injection. 3) The secureboot path allowlist would remove kernel crash reports. 4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot` - How I did it 1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster. This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization. The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel. The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`. 2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified. The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode. It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot. This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline. 3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths` 4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used. - How to verify it Regular boot: - enable kdump - enable docker_inram=on via kernel-params - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness OOM events on the console - after: crash kernel works and crash available under /var/crash Secure boot: - enable kdump - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness no kdump - after: crash kernel works and crash available under /var/crash Co-authored-by: Boyang Yu <byu@arista.com>	2021-02-02 01:55:09 -08:00
arlakshm	b5225407ef	[baseimage]: add docker ps to the sudoer file (#6604 ) fixes Azure/sonic-utilities#1389 With the recent changes in sudoer files. The show commands fails for the read-only users. The problem here is the 'docker ps' is failing in the function [get_routing_stack()](`8a1109ed30/show/main.py (L54)`) therefore all the CLI commands are failing. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-01-29 08:16:32 -08:00
arlakshm	ff8cc49b18	[multi asic] add ip netns identify command to sudoer (#6591 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> - Why I did it The command sudo ip netns identify <pid> is used in function get_current_namespace to check in the cli command is running in host context or within a namespace. This function is used for every CLI command and command sudo ip netns identify <pid> needs to be added in sudoer files to allow users with RO access to run show cli commands This problem is not there on single asic platforms. - How I did it Add ip netns identify [0-9]* to sudoers file.	2021-01-28 23:12:01 -08:00
abdosi	cfa8fbbf1a	[baseimage]: Updates for Ebtables and support for multi-asic (#6542 ) Following changes were done for ebtables: - Support for Multi-asic platforms. Ebtable filters are installed in namespace for multi-asic and not host. On Single asic installed on host. - For Multi-asic platforms we don't want to install on host otherwise Namespace-to-Namespace communication does not happens since ARP Request are not forwarded. - Updated to use text file to restore ebtables rules then the binary format. Rules are restore as part of Database docker init instead of rc.local - Removed the ebtable service files for buster as not needed as filters are restored/installed as part of database docker init. All the binaries are pre-installed with ebtables* binary are same as ebatbles-legacy-* Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-01-27 08:36:10 -08:00
arlakshm	0e12ca81c7	[Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com - Why I did it This PR has the changes to support having different swss.rec and sairedis.rec for each asic. The logrotate script is updated as well - How I did it Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747) In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec Update the logrotate script for multiasic platform .	2021-01-22 09:42:19 -08:00
Qi Luo	25e4d773b9	[baseimage]: Cleanup sudoers file (#6518 )	2021-01-21 08:28:32 -08:00
Ying Xie	054f5b7a53	[warm boot finalizer] only wait for enabled components to reconcile (#6454 ) * [warm boot finalizer] only wait for enabled components to reconcile Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2021-01-15 07:48:11 -08:00
yozhao101	04cd1d61e8	[Monit] Monitoring the running status of containers. (#6251 ) - Why I did it This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. - How I did it We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog. - How to verify it I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2021-01-07 19:52:22 -08:00
Renuka Manavalan	dbc6718408	Take a copy of existing TACACS credentials and restore it during upgrade (#6285 ) In scenario where upgrade gets config from minigraph, it could miss tacacs credentials as they are not in minigraph. Hence restore explicitly upon load-minigraph, if present. - Why I did it Upon boot, when config migration is required, the switch could load config from minigraph. The config-load from minigraph would wipe off TACACS key and disable login via TACACS, which would disable all remote user access. This change, would re-configure the TACACS if there is a saved copy available. - How I did it When config is loaded from minigraph, look for a TACACS credentials back up (tacacs.json) under /etc/sonic/old_config. If present, load the credentials into running config, before config-save is called. - How to verify it Remove /etc/sonic/config_db.json and do an image update. Upon reboot, w/o this change, you would not be able ssh in as remote user. You may login as admin and check out, "show tacacs" & "show aaa" to verify that tacacs-key is missing and login is not enabled for tacacs. With this change applied, remove /etc/sonic/config_db.json, but save tacacs & aaa credentials as tacacs.json in /etc/sonic/. Upon reboot, you should see remote user access possible.	2021-01-07 16:45:38 -08:00
Akhilesh Samineni	62e7c452d0	After first bootup, the FEATURE table is not present in CONFIG_DB (#5911 ) Fix the After first bootup(onie-install), the FEATURE table is not present in CONFIG_DB. Fix is done by calling config reload.	2021-01-05 09:22:16 -08:00
Prabhu Sreenivasan	df2a4ded98	[ntp]: Source interface support for NTP (#6033 ) Added source interface support for NTP. Also made NTP start on Mgmt-VRF by default when configured. - How I did it 1) Updated hostcfg to listen to global config NTP and NTP_SERVER tables and restart ntp when ever the configuration changes. NTP table includes source interface configuration. 2) The ntp script updated to by default start on Mgmt-VFT when configured. Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom>	2020-12-21 05:34:13 -08:00
abdosi	0755f29fe7	Telemetry Certificate Copy Across Image Upgrade. (#6252 ) To copy telemetry certificate during image upgrade from previous image to new image	2020-12-19 08:24:03 -08:00
arheneus@marvell.com	e88c7d11ca	[ntp][apparmor] Allow apparmor read permission for ntpd under rw mount path of rootfs (#6040 ) Certain platform specific packages sonic-platform-xyz, installs files onto rootfs, which would be placed on read-write mount path on /host/image-name/rw/... when ntpd starts it tries to do read access on /usr/bin /usr/sbin/ /usr/local/bin , which inturn links further to the read-write mount path also. Where ntpd would get below Apparmor Warning message LOG:- audit: type=1400 audit(1606226503.240:21): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/local/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:22): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/sbin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:23): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 Fix: Add rw/.. mount path similar to root path access provided for ntpd in /etc/apparmor.d/usr.sbin.ntpd Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-12-18 04:57:35 -08:00
shlomibitton	a6aaffd2ad	[kdump] Add more kernel panic conditions for vmcore dump (#6095 ) Create new file to "sysctl.d" with desired panic conditions. It will trigger a vmcore dump using kdump-tools on these situations. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2020-12-15 08:54:13 -08:00
rajendra-dendukuri	b60448a006	kdump: Add default kdump command line arguments (#6180 ) The default /etc/default/kdump-tools file provided by the kdump-tools package doesn't set a value for KDUMP_CMDLINE_APPEND. The default kdump command line arguments need to be set in order to extend them to use additional arguments required for SONiC platforms. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-15 08:52:23 -08:00
judyjoseph	6d9ecbcfd8	Move frr logs from syslog to /var/log/frr/*.log (#5988 ) - Why I did it Move frr logs from syslog from the directory /var/log/quagga/.log to /var/log/frr/log - How I did it Updated the rsyslog config files. - How to verify it Verified the logs come into the file zebra.log and bgpd.log in the DIR /var/log/frr/log	2020-12-10 08:44:34 -08:00
rajendra-dendukuri	31ce20ac38	[kdump]: Kdump usability and reliability improvements (#6113 ) - Allow platform specific reboot script to be called after crash kernel has finished copying the kernel vmcore - Disable pcie advanced features when running crash kernel. This improves reliability of the crash kernel to successfully create a vmcore and also reboot - Allow crash kernel to reboot if a panic is seen while it is generating a vmcore - Fix crash kernel to use the SONiC specific /usr/local/bin/reboot script instead of the Linux reboot command /sbin/reboot - Use sonic_platform as the kernel command line parameter to pass platform identifier string Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-10 01:32:37 -08:00
abdosi	59c1e3a78a	[multi-asic] Enhancing monit process checker for multi-asic. (#6100 ) Added Support of process checker for work on multi-asic platforms.	2020-12-04 10:39:43 -08:00
Prabhu Sreenivasan	2895b79482	[ntp]: NTP service ordering (#6115 ) Make sure ntp-config service is executed before ntpd Updated ntp-config service files to force dependency with ntp service. Also resolved circular dependency with --no-block flag. (needed as ntp-config service internally invokes systemd to restart ntp which in turn waits for ntp-config to complete) Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom.com>	2020-12-04 08:49:20 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
Blueve	6a6e583b06	[bash.bashrc] Add reverse SSH script to bash.bashrc (#5438 ) * [bash.bashrc] Add reverse SSH script to bash.bashrc * Fix command issue and add emptt line before EOF * Add checks for SSH_TARGET_CONSOLE_LINE Signed-off-by: Jing Kan jika@microsoft.com	2020-11-24 14:11:53 +08:00
Sudharsan Dhamal Gopalarathnam	98a434e8c1	Copp Manager Changes (#4861 ) *Introduce CoPP Manager infrastructure Copp service to generate initial copp config template file Co-authored-by: dgsudharsan <sudharsan_gopalarat@dell.com>	2020-11-23 09:31:42 -08:00
Sujin Kang	5b31996f7b	[reboot-history] Add reboot history to state db (#5933 ) - Why I did it Add reboot history to State db so that can be used telemetry service - How I did it Split the process-reboot-cause service to determine-reboot-cause and process-reboot-cause determine-reboot-cause to determine the reboot cause process-reboot-cause to parse the reboot cause files and put the reboot history to state db Moved to sonic-host-service* packages - How to verify it Performed unit test and tested on DUT	2020-11-20 20:08:18 -08:00
Joe LeVeque	23247514f9	Fix a number of LGTM alerts (#5952 ) Fix 259 alerts reported by the LGTM tool: - 245 for Unused import - 7 for Testing equality to None - 5 for Duplicate key in dict literal - 1 for Module is imported more than once - 1 for Unused local variable	2020-11-20 10:58:48 -08:00
JiangboHe	461e43649b	fix error: interface counters is mismatch after warm-reboot (#5346 ) - Why I did it There is a issue for counters after warm-reboot: If I clear counters by command "sonic-clear counters", then execute 'warm-reboot' and whenSONiC is restart, the counters showed with command "show interface counters" is still old counters before "sonic-clear". It is not the right counters because the counters file in '/tmp' is lost in warm-reboot process. - How I did it I fixed it by saving '/tmp/portstat-0' folders in '/host/' before executing 'warm-reboot' (in pull request Azure/sonic-utilities#1099 ), and restore the counters folders back to '/tmp/' after warm-reboot process is finished. - How to verify it Clear counters by command 'sonic-clear' sonic-clear counters sonic-clear dropcounters sonic-clear pfccounters sonic-clear queuecounters sonic-clear rifcounters Execute 'warm-reboot' Use command ‘show interface counters’ to see if the counters is right.	2020-11-20 10:37:45 -08:00
pavel-shirshov	a92732fe5d	[bgpcfgd]: Fixes for BBR (#5956 ) * Add explicit default state into the constants.yml * Enable/disable only peer-groups, available in the config * Retrieve updates from frr before using configuration Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-19 00:07:58 -08:00
Prince Sunny	1eaaf64ed2	Set preference for forced mgmt routes (#5844 ) When forced mgmt routes are present, the issue fixed as part of #5754 is not complete. Added a preference(priority) field to forced mgmt route ip rules	2020-11-10 14:20:13 -08:00
arlakshm	2b41f6bd5c	Add the vtysh command with newly added "-n" option for multi asic to the read_only_cmds (#5845 ) In multi asic platforms the "show ip bgp summary" commands is not available for user with read only privileges, so to fix this the vtysh command with the new "-n" option, added for multi asic platforms, needs to be added to the READ_ONLY_COMMANDS list in the sudoers files. Added the command vtysh -n [0-9] -c show * to list of READ_ONLY_COMMANDS in the sudoers files in this commit. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-11-10 12:18:49 -08:00
Joe LeVeque	04d0e8ab00	[hostcfgd] Convert to Python 3; Add to sonic-host-services package (#5713 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported.	2020-11-07 12:48:19 -08:00
Joe LeVeque	9e7e092610	[Monit process_checker] Convert to Python 3 (#5836 ) Convert process_checker script to Python 3	2020-11-07 12:46:23 -08:00
Stepan Blyshchak	9bc693ce6e	[hostcfgd] If feature state entry not in the cache, add a default state (#5777 ) Our use case is to register new features in runtime. The previous change which introduced the cache broke this capability and caused hostcfgd crash. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2020-11-06 10:24:31 -08:00
Joe LeVeque	13ff7b38d5	[docker-wait-any] Convert to Python 3, install dependency in host OS (#5784 ) - Convert docker-wait-any script to Python 3 - Install Python 3 Docker Engine API in host OS	2020-11-05 11:23:00 -08:00
Joe LeVeque	d8045987a6	[core_uploader.py] Convert to Python 3; Use logger from sonic-py-common for uniform logging (#5790 ) - Convert core_uploader.py script to Python 3 - Use logger from sonic-py-common for uniform logging - Reorganize imports alphabetically per PEP8 standard - Two blank lines precede functions per PEP8 standard - Remove unnecessary global variable declarations	2020-11-05 11:19:26 -08:00
Lawrence Lee	10ab46f7a0	Revert "[docker-base]: Rate limit priority INFO and lower in syslog" (#5763 ) * This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.	2020-11-02 08:49:40 -08:00
lguohan	c8a00eda95	[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 ) Fix Azure/SONiC#551 When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]". This l3mdev IP rule is never getting deleted even if VRF is deleted. Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ". This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551. More details and possible solutions are explained as comments in the Issue551. This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address. Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address. Co-authored-by: Kannan KVS <kannan_kvs@dell.com>	2020-10-31 20:45:59 -07:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
Renuka Manavalan	8d8aadb615	Load config after subscribe (#5740 ) - Why I did it The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order. - How I did it Do a load after after updating all feature states. - How to verify it Not a easy one Have a script that restart hostcfgd sleep 2s run redis-cli/config command to update AAA/TACACS table Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute. - When it repro: The updates will not reflect in /etc/pam.d/common-auth-sonic	2020-10-31 16:38:32 -07:00
Joe LeVeque	e111204206	[caclmgrd] Convert to Python 3; Add to sonic-host-services package (#5739 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported, convert caclmgrd to Python 3 and add to sonic-host-services package	2020-10-29 16:29:12 -07:00
judyjoseph	6088bd59de	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-10-28 16:41:27 -07:00
bingwang-ms	36c52cca2b	Fix 'NoSuchProcess' exception in process_checker (#5716 ) The psutil library used in process_checker create a cache for each process when calling process_iter. So, there is some possibility that one process exists when calling process_iter, but not exists when calling cmdline, which will raise a NoSuchProcess exception. This commit fix the issue. Signed-off-by: bingwang <bingwang@microsoft.com>	2020-10-27 09:25:35 +08:00
Joe LeVeque	3a4435eb53	Add sonic-host-services and sonic-host-services-data packages (#5694 ) - Why I did it Install all host services and their data files in package format rather than file-by-file - How I did it - Create sonic-host-services Python wheel package, currently including procdockerstatsd - Also add the framework for unit tests by adding one simple procdockerstatsd test case - Create sonic-host-services-data Debian package which is responsible for installing the related systemd unit files to control the services in the Python wheel. This package will also be responsible for installing any Jinja2 templates and other data files needed by the host services.	2020-10-23 09:52:29 -07:00
judyjoseph	ace7f24cba	[docker-teamd]: Add teamd as a depedent service to swss (#5628 ) - Why I did it On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. - How I did it Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service - How to verify it Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( confirmed swss don't exit here ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios	2020-10-23 00:41:16 -07:00
yozhao101	af97e23686	[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689 ) - Why I did it If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service. The snmp.timer service will first stop snmp.service and later start snmp.service. In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was updated. - How I did it When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached value will also be updated. Otherwise, nothing will be done. - How to verify it We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-10-22 20:01:07 -07:00

1 2 3 4 5 ...

355 Commits