sonic-buildimage

Author	SHA1	Message	Date
SuvarnaMeenakshi	5a49a0f499	[multi-asic][vs]: Update topology script to retrieve hwsku from minigraph (#6219 ) Update topology script to retrieve hwsku from minigraph if hwsku information is not available in config_db. Fix clean up of interfaces in msft_multi_asic_vs hwsku topology script. - Why I did it When bringing up multi-asic VS switch, topology service is started during boot up. Topology service starts a shell script which runs the topology script present in /usr/share/sonic/device// directory. To invoke hwsku specific script, the topology script tries to retrieve hwsku information from config_db. During initial boot up config_db might not be populated. In order to start topology service before config_db is updated, update topology script to get hwsku information from minigraph.xml if it is available. This will be helpful to bring up multi-asic VS testbed by loading minigraph and starting topology service. - How I did it Update topology.sh script to retrieve hwsku information from minigraph.xml. Fix clean up function on msft_multi_asic_vs toplogy script. - How to verify it single-asic VS - no change; topology service is only enabled for multi-asic VS. multi-asic VS - Bring up multi-asic VS image, copy minigraph to vs image, start topology service. Topology service should be successful. to test clean up function fix, start topology service - make sure interfaces are created and moved to the right namespaces. stop topology service - make sure namespace do not have any interface and all front end interfaces are present in default namespace.	2021-02-18 22:02:29 -08:00
Joe LeVeque	820d350301	[pcie-check] Update underlying pcieutil command and add to sudoers file (#6682 ) - Why I did it As of Azure/sonic-utilities#1297, subcommands of pcieutil have changed to remove the redundant pcie- prefix. This PR adapts calling applications (pcie-check) to the new syntax. Resolves #6676 - How I did it Remove pcie- prefix from pcieutil subcommands in calling applications Also add pcieutil * to sudoers file, as pcieutil requires elevated permissions	2021-02-04 12:14:08 -08:00
Samuel Angebault	0c4d4ace76	[kdump] Fix OOM events in crashkernel (#6447 ) A few issues where discovered with crashkernel on Arista platforms. 1) platforms using `docker_inram=on` would end up OOM in kdump environment. This happens because the same initramfs is used by SONiC and the crashkernel. With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion. Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM. This OOM event can in turn trigger a panic. 2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot. This happens because the `boot0` in secureboot mode is strict about kernel parameter injection. 3) The secureboot path allowlist would remove kernel crash reports. 4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot` - How I did it 1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster. This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization. The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel. The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`. 2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified. The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode. It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot. This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline. 3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths` 4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used. - How to verify it Regular boot: - enable kdump - enable docker_inram=on via kernel-params - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness OOM events on the console - after: crash kernel works and crash available under /var/crash Secure boot: - enable kdump - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness no kdump - after: crash kernel works and crash available under /var/crash Co-authored-by: Boyang Yu <byu@arista.com>	2021-02-02 01:55:09 -08:00
arlakshm	b5225407ef	[baseimage]: add docker ps to the sudoer file (#6604 ) fixes Azure/sonic-utilities#1389 With the recent changes in sudoer files. The show commands fails for the read-only users. The problem here is the 'docker ps' is failing in the function [get_routing_stack()](`8a1109ed30/show/main.py (L54)`) therefore all the CLI commands are failing. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-01-29 08:16:32 -08:00
arlakshm	ff8cc49b18	[multi asic] add ip netns identify command to sudoer (#6591 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> - Why I did it The command sudo ip netns identify <pid> is used in function get_current_namespace to check in the cli command is running in host context or within a namespace. This function is used for every CLI command and command sudo ip netns identify <pid> needs to be added in sudoer files to allow users with RO access to run show cli commands This problem is not there on single asic platforms. - How I did it Add ip netns identify [0-9]* to sudoers file.	2021-01-28 23:12:01 -08:00
abdosi	cfa8fbbf1a	[baseimage]: Updates for Ebtables and support for multi-asic (#6542 ) Following changes were done for ebtables: - Support for Multi-asic platforms. Ebtable filters are installed in namespace for multi-asic and not host. On Single asic installed on host. - For Multi-asic platforms we don't want to install on host otherwise Namespace-to-Namespace communication does not happens since ARP Request are not forwarded. - Updated to use text file to restore ebtables rules then the binary format. Rules are restore as part of Database docker init instead of rc.local - Removed the ebtable service files for buster as not needed as filters are restored/installed as part of database docker init. All the binaries are pre-installed with ebtables* binary are same as ebatbles-legacy-* Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-01-27 08:36:10 -08:00
arlakshm	0e12ca81c7	[Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com - Why I did it This PR has the changes to support having different swss.rec and sairedis.rec for each asic. The logrotate script is updated as well - How I did it Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747) In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec Update the logrotate script for multiasic platform .	2021-01-22 09:42:19 -08:00
Qi Luo	25e4d773b9	[baseimage]: Cleanup sudoers file (#6518 )	2021-01-21 08:28:32 -08:00
Ying Xie	054f5b7a53	[warm boot finalizer] only wait for enabled components to reconcile (#6454 ) * [warm boot finalizer] only wait for enabled components to reconcile Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2021-01-15 07:48:11 -08:00
yozhao101	04cd1d61e8	[Monit] Monitoring the running status of containers. (#6251 ) - Why I did it This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. - How I did it We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog. - How to verify it I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2021-01-07 19:52:22 -08:00
Renuka Manavalan	dbc6718408	Take a copy of existing TACACS credentials and restore it during upgrade (#6285 ) In scenario where upgrade gets config from minigraph, it could miss tacacs credentials as they are not in minigraph. Hence restore explicitly upon load-minigraph, if present. - Why I did it Upon boot, when config migration is required, the switch could load config from minigraph. The config-load from minigraph would wipe off TACACS key and disable login via TACACS, which would disable all remote user access. This change, would re-configure the TACACS if there is a saved copy available. - How I did it When config is loaded from minigraph, look for a TACACS credentials back up (tacacs.json) under /etc/sonic/old_config. If present, load the credentials into running config, before config-save is called. - How to verify it Remove /etc/sonic/config_db.json and do an image update. Upon reboot, w/o this change, you would not be able ssh in as remote user. You may login as admin and check out, "show tacacs" & "show aaa" to verify that tacacs-key is missing and login is not enabled for tacacs. With this change applied, remove /etc/sonic/config_db.json, but save tacacs & aaa credentials as tacacs.json in /etc/sonic/. Upon reboot, you should see remote user access possible.	2021-01-07 16:45:38 -08:00
Akhilesh Samineni	62e7c452d0	After first bootup, the FEATURE table is not present in CONFIG_DB (#5911 ) Fix the After first bootup(onie-install), the FEATURE table is not present in CONFIG_DB. Fix is done by calling config reload.	2021-01-05 09:22:16 -08:00
Prabhu Sreenivasan	df2a4ded98	[ntp]: Source interface support for NTP (#6033 ) Added source interface support for NTP. Also made NTP start on Mgmt-VRF by default when configured. - How I did it 1) Updated hostcfg to listen to global config NTP and NTP_SERVER tables and restart ntp when ever the configuration changes. NTP table includes source interface configuration. 2) The ntp script updated to by default start on Mgmt-VFT when configured. Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom>	2020-12-21 05:34:13 -08:00
abdosi	0755f29fe7	Telemetry Certificate Copy Across Image Upgrade. (#6252 ) To copy telemetry certificate during image upgrade from previous image to new image	2020-12-19 08:24:03 -08:00
arheneus@marvell.com	e88c7d11ca	[ntp][apparmor] Allow apparmor read permission for ntpd under rw mount path of rootfs (#6040 ) Certain platform specific packages sonic-platform-xyz, installs files onto rootfs, which would be placed on read-write mount path on /host/image-name/rw/... when ntpd starts it tries to do read access on /usr/bin /usr/sbin/ /usr/local/bin , which inturn links further to the read-write mount path also. Where ntpd would get below Apparmor Warning message LOG:- audit: type=1400 audit(1606226503.240:21): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/local/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:22): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/sbin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:23): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 Fix: Add rw/.. mount path similar to root path access provided for ntpd in /etc/apparmor.d/usr.sbin.ntpd Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-12-18 04:57:35 -08:00
shlomibitton	a6aaffd2ad	[kdump] Add more kernel panic conditions for vmcore dump (#6095 ) Create new file to "sysctl.d" with desired panic conditions. It will trigger a vmcore dump using kdump-tools on these situations. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2020-12-15 08:54:13 -08:00
rajendra-dendukuri	b60448a006	kdump: Add default kdump command line arguments (#6180 ) The default /etc/default/kdump-tools file provided by the kdump-tools package doesn't set a value for KDUMP_CMDLINE_APPEND. The default kdump command line arguments need to be set in order to extend them to use additional arguments required for SONiC platforms. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-15 08:52:23 -08:00
judyjoseph	6d9ecbcfd8	Move frr logs from syslog to /var/log/frr/*.log (#5988 ) - Why I did it Move frr logs from syslog from the directory /var/log/quagga/.log to /var/log/frr/log - How I did it Updated the rsyslog config files. - How to verify it Verified the logs come into the file zebra.log and bgpd.log in the DIR /var/log/frr/log	2020-12-10 08:44:34 -08:00
rajendra-dendukuri	31ce20ac38	[kdump]: Kdump usability and reliability improvements (#6113 ) - Allow platform specific reboot script to be called after crash kernel has finished copying the kernel vmcore - Disable pcie advanced features when running crash kernel. This improves reliability of the crash kernel to successfully create a vmcore and also reboot - Allow crash kernel to reboot if a panic is seen while it is generating a vmcore - Fix crash kernel to use the SONiC specific /usr/local/bin/reboot script instead of the Linux reboot command /sbin/reboot - Use sonic_platform as the kernel command line parameter to pass platform identifier string Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-10 01:32:37 -08:00
abdosi	59c1e3a78a	[multi-asic] Enhancing monit process checker for multi-asic. (#6100 ) Added Support of process checker for work on multi-asic platforms.	2020-12-04 10:39:43 -08:00
Prabhu Sreenivasan	2895b79482	[ntp]: NTP service ordering (#6115 ) Make sure ntp-config service is executed before ntpd Updated ntp-config service files to force dependency with ntp service. Also resolved circular dependency with --no-block flag. (needed as ntp-config service internally invokes systemd to restart ntp which in turn waits for ntp-config to complete) Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom.com>	2020-12-04 08:49:20 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
Blueve	6a6e583b06	[bash.bashrc] Add reverse SSH script to bash.bashrc (#5438 ) * [bash.bashrc] Add reverse SSH script to bash.bashrc * Fix command issue and add emptt line before EOF * Add checks for SSH_TARGET_CONSOLE_LINE Signed-off-by: Jing Kan jika@microsoft.com	2020-11-24 14:11:53 +08:00
Sudharsan Dhamal Gopalarathnam	98a434e8c1	Copp Manager Changes (#4861 ) *Introduce CoPP Manager infrastructure Copp service to generate initial copp config template file Co-authored-by: dgsudharsan <sudharsan_gopalarat@dell.com>	2020-11-23 09:31:42 -08:00
Sujin Kang	5b31996f7b	[reboot-history] Add reboot history to state db (#5933 ) - Why I did it Add reboot history to State db so that can be used telemetry service - How I did it Split the process-reboot-cause service to determine-reboot-cause and process-reboot-cause determine-reboot-cause to determine the reboot cause process-reboot-cause to parse the reboot cause files and put the reboot history to state db Moved to sonic-host-service* packages - How to verify it Performed unit test and tested on DUT	2020-11-20 20:08:18 -08:00
Joe LeVeque	23247514f9	Fix a number of LGTM alerts (#5952 ) Fix 259 alerts reported by the LGTM tool: - 245 for Unused import - 7 for Testing equality to None - 5 for Duplicate key in dict literal - 1 for Module is imported more than once - 1 for Unused local variable	2020-11-20 10:58:48 -08:00
JiangboHe	461e43649b	fix error: interface counters is mismatch after warm-reboot (#5346 ) - Why I did it There is a issue for counters after warm-reboot: If I clear counters by command "sonic-clear counters", then execute 'warm-reboot' and whenSONiC is restart, the counters showed with command "show interface counters" is still old counters before "sonic-clear". It is not the right counters because the counters file in '/tmp' is lost in warm-reboot process. - How I did it I fixed it by saving '/tmp/portstat-0' folders in '/host/' before executing 'warm-reboot' (in pull request Azure/sonic-utilities#1099 ), and restore the counters folders back to '/tmp/' after warm-reboot process is finished. - How to verify it Clear counters by command 'sonic-clear' sonic-clear counters sonic-clear dropcounters sonic-clear pfccounters sonic-clear queuecounters sonic-clear rifcounters Execute 'warm-reboot' Use command ‘show interface counters’ to see if the counters is right.	2020-11-20 10:37:45 -08:00
pavel-shirshov	a92732fe5d	[bgpcfgd]: Fixes for BBR (#5956 ) * Add explicit default state into the constants.yml * Enable/disable only peer-groups, available in the config * Retrieve updates from frr before using configuration Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-19 00:07:58 -08:00
Prince Sunny	1eaaf64ed2	Set preference for forced mgmt routes (#5844 ) When forced mgmt routes are present, the issue fixed as part of #5754 is not complete. Added a preference(priority) field to forced mgmt route ip rules	2020-11-10 14:20:13 -08:00
arlakshm	2b41f6bd5c	Add the vtysh command with newly added "-n" option for multi asic to the read_only_cmds (#5845 ) In multi asic platforms the "show ip bgp summary" commands is not available for user with read only privileges, so to fix this the vtysh command with the new "-n" option, added for multi asic platforms, needs to be added to the READ_ONLY_COMMANDS list in the sudoers files. Added the command vtysh -n [0-9] -c show * to list of READ_ONLY_COMMANDS in the sudoers files in this commit. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-11-10 12:18:49 -08:00
Joe LeVeque	04d0e8ab00	[hostcfgd] Convert to Python 3; Add to sonic-host-services package (#5713 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported.	2020-11-07 12:48:19 -08:00
Joe LeVeque	9e7e092610	[Monit process_checker] Convert to Python 3 (#5836 ) Convert process_checker script to Python 3	2020-11-07 12:46:23 -08:00
Stepan Blyshchak	9bc693ce6e	[hostcfgd] If feature state entry not in the cache, add a default state (#5777 ) Our use case is to register new features in runtime. The previous change which introduced the cache broke this capability and caused hostcfgd crash. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2020-11-06 10:24:31 -08:00
Joe LeVeque	13ff7b38d5	[docker-wait-any] Convert to Python 3, install dependency in host OS (#5784 ) - Convert docker-wait-any script to Python 3 - Install Python 3 Docker Engine API in host OS	2020-11-05 11:23:00 -08:00
Joe LeVeque	d8045987a6	[core_uploader.py] Convert to Python 3; Use logger from sonic-py-common for uniform logging (#5790 ) - Convert core_uploader.py script to Python 3 - Use logger from sonic-py-common for uniform logging - Reorganize imports alphabetically per PEP8 standard - Two blank lines precede functions per PEP8 standard - Remove unnecessary global variable declarations	2020-11-05 11:19:26 -08:00
Lawrence Lee	10ab46f7a0	Revert "[docker-base]: Rate limit priority INFO and lower in syslog" (#5763 ) * This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.	2020-11-02 08:49:40 -08:00
lguohan	c8a00eda95	[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 ) Fix Azure/SONiC#551 When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]". This l3mdev IP rule is never getting deleted even if VRF is deleted. Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ". This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551. More details and possible solutions are explained as comments in the Issue551. This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address. Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address. Co-authored-by: Kannan KVS <kannan_kvs@dell.com>	2020-10-31 20:45:59 -07:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
Renuka Manavalan	8d8aadb615	Load config after subscribe (#5740 ) - Why I did it The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order. - How I did it Do a load after after updating all feature states. - How to verify it Not a easy one Have a script that restart hostcfgd sleep 2s run redis-cli/config command to update AAA/TACACS table Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute. - When it repro: The updates will not reflect in /etc/pam.d/common-auth-sonic	2020-10-31 16:38:32 -07:00
Joe LeVeque	e111204206	[caclmgrd] Convert to Python 3; Add to sonic-host-services package (#5739 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported, convert caclmgrd to Python 3 and add to sonic-host-services package	2020-10-29 16:29:12 -07:00
judyjoseph	6088bd59de	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-10-28 16:41:27 -07:00
bingwang-ms	36c52cca2b	Fix 'NoSuchProcess' exception in process_checker (#5716 ) The psutil library used in process_checker create a cache for each process when calling process_iter. So, there is some possibility that one process exists when calling process_iter, but not exists when calling cmdline, which will raise a NoSuchProcess exception. This commit fix the issue. Signed-off-by: bingwang <bingwang@microsoft.com>	2020-10-27 09:25:35 +08:00
Joe LeVeque	3a4435eb53	Add sonic-host-services and sonic-host-services-data packages (#5694 ) - Why I did it Install all host services and their data files in package format rather than file-by-file - How I did it - Create sonic-host-services Python wheel package, currently including procdockerstatsd - Also add the framework for unit tests by adding one simple procdockerstatsd test case - Create sonic-host-services-data Debian package which is responsible for installing the related systemd unit files to control the services in the Python wheel. This package will also be responsible for installing any Jinja2 templates and other data files needed by the host services.	2020-10-23 09:52:29 -07:00
judyjoseph	ace7f24cba	[docker-teamd]: Add teamd as a depedent service to swss (#5628 ) - Why I did it On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. - How I did it Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service - How to verify it Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( confirmed swss don't exit here ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios	2020-10-23 00:41:16 -07:00
yozhao101	af97e23686	[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689 ) - Why I did it If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service. The snmp.timer service will first stop snmp.service and later start snmp.service. In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was updated. - How I did it When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached value will also be updated. Otherwise, nothing will be done. - How to verify it We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-10-22 20:01:07 -07:00
pavel-shirshov	c94f93f046	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-22 11:04:21 -07:00
Lawrence Lee	207587d97c	[docker-base]: Rate limit priority INFO and lower in syslog (#5666 ) There is currently a bug where messages from swss with priority lower than the current log level are still being counted against the syslog rate limiting threshhold. This leads to rate-limiting in syslog when the rate-limiting conditions have not been met, which causes several sonic-mgmt tests to fail since they are dependent on LogAnalyzer. It also omits potentially useful information from the syslog. Only rate-limiting messages of level INFO and lower allows these tests to pass successfully. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-10-20 11:52:46 -07:00
pavel-shirshov	d19d1dd569	[bgpcfgd]: Change prefix-list generation for "Allow prefix" feature (#5639 ) - Why I did it I was asked to change "Allow list" prefix-list generation rule. Previously we generated the rules using following method: ``` For each {prefix}/{masklen} we would generate the prefix-rule permit {prefix}/{masklen} ge {masklen}+1 Example: Prefix 1.2.3.4/24 would have following prefix-list entry generated permit 1.2.3.4/24 ge 23 ``` But we discovered the old rule doesn't work for all cases we have. So we introduced the new rule: ``` For ipv4 entry, For mask < 32 , we will add ‘le 32’ to cover all prefix masks to be sent by T0 For mask =32 , we will not add any ‘le mask’ For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0 For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0 For mask = 128 , we will not add any ‘le mask’ ``` - How I did it I change prefix-list entry generation function. Also I introduced a test for the changed function. - How to verify it 1. Build an image and put it on your dut. 2. Create a file test_schema.conf with the test configuration ``` { "BGP_ALLOWED_PREFIXES": { "DEPLOYMENT_ID\|0\|1010:1010": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] }, "DEPLOYMENT_ID\|0": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] } } } ``` 3. Apply the configuration by command ``` sonic-cfggen -j test_schema.conf --write-to-db ``` 4. Check that your bgp configuration has following prefix-list entries: ``` admin@str-s6100-acs-1:~$ show runningconfiguration bgp \| grep PL_ALLOW ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128 ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-10-20 00:38:09 -07:00
Joe LeVeque	edf4971b16	[caclmgrd] Prevent unnecessary iptables updates (#5312 ) When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied. This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.	2020-10-19 11:11:30 -07:00
Joe LeVeque	678b66359d	[procdockerstatsd] Convert to Python 3 (#5657 ) Make procdockerstatsd Python 3-compliant and set interpreter to python3 in shebang. Also some other cleanup to improve code reuse.	2020-10-19 09:46:02 -07:00
Rajkumar-Marvell	5708e32ccf	Set sock rx Buf size to 3MB. (#5566 ) * Set sock rx Buf size to 3MB.	2020-10-15 14:40:59 -07:00
BrynXu	a2e3d2fcea	[ChassisDB]: bring up ChassisDB service (#5283 ) bring up chassisdb service on sonic switch according to the design in Distributed Forwarding in VoQ Arch HLD Signed-off-by: Honggang Xu <hxu@arista.com> - Why I did it To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](`90c1289eaf/doc/chassis/architecture.md`). - How I did it Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD. - How to verify it ChassisDB service won't start without chassisdb.conf file on the existing platforms. ChassisDB service is accessible with global.conf file in the distributed arichitecture. Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-14 15:15:24 -07:00
abdosi	9094e2176f	Optimze ACL Table/Rule notification handling (#5621 ) * Optimze ACL Table/Rule notifcation handling to loop pop() until empty to consume all the data in a batch This wau we prevent multiple call to iptable updates Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address review comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-14 08:05:33 -07:00
Junchao-Mellanox	1c97a03b81	[system-health] Add support for monitoring system health (#4835 ) * system health first commit * system health daemon first commit * Finish healthd * Changes due to lower layer logic change * Get ASIC temperature from TEMPERATURE_INFO table * Add system health make rule and service files * fix bugs found during manual test * Change make file to install system-health library to host * Set system LED to blink on bootup time * Caught exceptions in system health checker to make it more robust * fix issue that fan/psu presence will always be true * fix issue for external checker * move system-health service to right after rc-local service * Set system-health service start after database service * Get system up time via /proc/uptime * Provide more information in stat for CLI to use * fix typo * Set default category to External for external checker * If external checker reported OK, save it to stat too * Trim string for external checker output * fix issue: PSU voltage check always return OK * Add unit test cases for system health library * Fix LGTM warnings * fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon * Remove boot_timeout configuration because it will get from monit config file * Fix argument miss * fix unit test failure * fix issue: summary status is not correct * Fix format issues found in code review * rename th to threshold to make it clearer * Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead * Fix unit test failure * Fix LGTM alert * Fix LGTM alert * Fix review comments * Fix review comment * 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker * Ignore check for unknown service type * Fix unit test issue * Rename user define checker to user defined checker * Rename user_define_checkers to user_defined_checkers for configuration file * Renmae file user_define_checker.py -> user_defined_checker.py * Fix typo * Adjust import order for config.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/health_checker/hardware_checker.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/scripts/healthd Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import orders in src/system-health/tests/test_system_health.py * Fix typo * Add new line after import * If system health configuration file not exist, healthd should exit * Fix indent and enable pytest coverage * Fix typo * Fix typo * Remove global logger and use log functions inherited from super class * Change info level logger to notice level Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>	2020-10-12 11:12:49 +03:00
abdosi	01fceb6f79	Optimized caclmgrd Notification handling. Previously (#5560 ) any event happening on ACL Rule Table (eg DATAACL rules programmed) caused control plane default action to be triggered. Now Control Plance ACTION will be trigger only a) ACL Rule beloging to Control ACL Table Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-08 11:31:09 -07:00
Ying Xie	ec0153008a	[rc.local] separate configuration migration and grub installation logic (#5528 ) To address issue #5525 Explicitly control the grub installation requirement when it is needed. We have scenario where configuration migration happened but grub installation is not required. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-10-03 23:00:39 -07:00
pavel-shirshov	ffae82f8be	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-02 10:06:04 -07:00
Guohan Lu	e412338743	Revert "[bgp] Add 'allow list' manager feature (#5309 )" This reverts commit `6eed0820c8`.	2020-09-28 22:00:29 -07:00
pavel-shirshov	6eed0820c8	[bgp] Add 'allow list' manager feature (#5309 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-09-27 10:47:43 -07:00
judyjoseph	4006ce711f	[Multi-Asic] Forward SNMP requests received on front panel interface to SNMP agent in host. (#5420 ) * [Multi-Asic] Forward SNMP requests destined to loopback IP, and coming in through the front panel interface present in the network namespace, to SNMP agent running in the linux host. * Updates based on comments * Further updates in docker_image_ctl.j2 and caclmgrd * Change the variable for net config file. * Updated the comments in the code. * No need to clean up the exising NAT rules if present, which could be created by some other process. * Delete our rule first and add it back, to take care of caclmgrd restart. Another benefit is that we delete only our rules, rather than earlier approach of "iptables -F" which cleans up all rules. * Keeping the original logic to clean the NAT entries, to revist when NAT feature added in namespace. * Missing updates to log_info call.	2020-09-26 12:14:30 -07:00
bingwang-ms	584e2223dc	Fix exception when attempting to write a datetime to db (#5467 ) redis-py 3.0 used in master branch only accepts user data as bytes, strings or numbers (ints, longs and floats). Attempting to specify a key or a value as any other type will raise a DataError exception. This PR address the issue bt converting datetime to str	2020-09-25 20:19:18 +08:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Venkatesan Mahalingam	418e437d79	[caclmgrd] Add support to allow/deny any IP/IPv6 protocol packets coming to CPU based on source IP (#4591 ) Add support to allow/deny packets coming to CPU based on source IP, regardless of destination port	2020-09-23 09:55:09 -07:00
abdosi	75e4258508	Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358 ) * Enhanced Feature Table state enable/disbale for multi-asic platforms. In Multi-asic for some features we can service per asic so we need to get list of all services. Also updated logic to return if any one of systemctl command return failure and make sure syslog of feature getting enable/disable only come when all commads are sucessful. Moved the service list get api from sonic-util to sonic-py-common Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Make sure to retun None for both service list in case of error. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Return empty list as fail condition Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Address Review Comments. Made init_cfg.json.j2 knowledegable of Feature service is global scope or per asic scope Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Fix merge conflict * Address Review Comment. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-22 08:34:02 -07:00
Volodymyr Boiko	97aee026de	[logrotate] create separate logrotate.d config for update-alternatives (#5382 ) To fix the following error when running `logrotate /etc/logrotate.conf` : ``` error: dpkg:10 duplicate log entry for /var/log/alternatives.log error: found error in file dpkg, skipping ``` update-alternatives is provided with dedicated logrotate config in newer dpkg package versions (probably starting from buster) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>	2020-09-22 01:23:42 -07:00
Joe LeVeque	3987cbd80a	[sonic-utilities] Build and install as a Python wheel package (#5409 ) We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3. Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package. - How I did it - Build and install sonic-utilities as a Python package - Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel - Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications - Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable Submodule updates: * src/sonic-utilities aa27dd9...2244d7b (5): > Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122) > [consutil] Display remote device name in show command (#1120) > [vrf] fix check state_db error when vrf moving (#1119) > [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117) > Update to make config load/reload backward compatible. (#1115) * src/sonic-ztp dd025bc...911d622 (1): > Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)	2020-09-20 20:16:42 -07:00
abdosi	d12e9cbbc6	[Multi-Asic] Fix for multi-asic where we should allow docker local (#5364 ) communication on docker eth0 ip . Without this TCP Connection to Redis does not happen in namespace. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-16 11:32:35 -07:00
Joe LeVeque	c7186a2d39	[process-reboot-cause] Use Logger class from sonic-py-common package (#5384 ) Eliminate duplicate logging code by importing Logger class from sonic-py-common package.	2020-09-16 10:35:19 -07:00
Samuel Angebault	9bf4b0a93e	[baseimage]: Change the loopback mask from /8 to /16 (#5353 ) As per the VOQ HLDs, internal networking between the linecards and supervisor is required within a chassis. Allocating 127.X/16 subnets for private communication within a chassis is a good candidate. It doesn't require any external IP allocation as well as ensure that the traffic will not leave the chassis. References: https://github.com/Azure/SONiC/pull/622 https://github.com/Azure/SONiC/pull/639 - How I did it Changed the `interfaces.j2` file to add `127.0.0.1/16` as the `lo` ip address. Then once the interface is up, the post-up command removes the `127.0.0.1/8` ip address. The order in which the netmask change is made matters for `127.0.0.1` to be reachable at all times. - How to verify it ``` root@sonic:~# ip address show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/16 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever ``` Co-authored-by: Baptiste Covolato <baptiste@arista.com>	2020-09-15 15:29:48 -07:00
Petro Bratash	558ec53aa6	Fix bug with pcie-check.service (#5368 ) * Change STATE_DB key (PCIE_STATUS\|PCIE_DEVICES -> PCIE_DEVICES) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * [pcie-check.service] Add dependency on database.service Signed-off-by: Petro Bratash <petrox.bratash@intel.com>	2020-09-15 15:21:31 -07:00
Joe LeVeque	1ac146dd97	[caclmgrd] Inherit DaemonBase class from sonic-py-common package (#5373 ) Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.	2020-09-15 13:34:41 -07:00
Joe LeVeque	3a901eeae0	[procdockerstatsd] Inherit DaemonBase class from sonic-py-common package (#5372 ) Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.	2020-09-14 16:36:37 -07:00
noaOrMlnx	353003f6ee	Change update_feature_state call to pass False as default if feature has no 'has_timer' field (#5260 ) * Pass False as default if feature has no timer field * Update hostcfgd to fit the new changes merged New changes can be found in PR:5248	2020-09-14 11:28:24 -07:00
Blueve	01fb32fa08	[conf] append nos-config-part for s6100 (#5234 ) * [conf] append nos-config-part for s6100 * modify rc.local Signed-off-by: Guohan Lu <lguohan@gmail.com> * Update rc.local Co-authored-by: Blueve <jika@microsoft.com> Co-authored-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>	2020-09-08 12:29:02 -07:00
arheneus@marvell.com	f136fd0623	[ebtbles] Replace binary config file to text config file for ebtables (#5252 ) Issue: Binary ebtables config file is CPU arch dependent Fix: Load the text config during firsttime boot and Generate the binary persistent atomic file Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-09-03 17:27:07 -07:00
abdosi	dd908c2ee2	[sonic-swsscommon] submodule update with commit's (#5300 ) [schema] Make schema header support C project (#373) Removed DB specific get api's from Selectable class (#378) With the change as part of #378 caclmgrd need to be updated to use new client side Get API to access namespace. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-09-02 18:09:03 -07:00
Joe LeVeque	07b9d7f44d	[pcie-check] Make pcie-check.sh executable (#5256 ) The pcie-check.sh script was added in https://github.com/Azure/sonic-buildimage/pull/4771, but was not given executable permission. Therefore, we would see messages like: ``` Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'. ```	2020-08-29 10:29:42 -07:00
Tamer Ahmed	7d3ec60b1f	[hostcfgd] Fix Boolean String Evaluation (#5248 ) New attribute 'has_timer' introduced to init_cfg.json does not evaluate as Bool, rather it evaluates as string. This PR fixes this issue. Also, this PR fixes an issue when there is system config unit (snmp, telemetry) that has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings in the [Install] section. In the latter case, the .service should not be enabled. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-27 06:50:03 -07:00
Tamer Ahmed	90cbb4d78c	[hostcfgd] Handle Both Service And Timer Units (#5228 ) Commit `e484ae9dd` introduced systemd .timer unit to hostcfgd. However, when stopping service that has timer, there is possibility that timer is not running and the service would not be stopped. This PR address this situation by handling both .timer and .service units. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-21 09:51:41 -07:00
abdosi	74d8b4a6be	[caclmgrd] Add support for multi-ASIC platforms (#5022 ) * Support for Control Plane ACL's for Multi-asic Platforms. Following changes were done: 1) Moved from using blocking listen() on Config DB to the select() model via python-swsscommon since we have to wait on event from multiple config db's 2) Since python-swsscommon is not available on host added libswsscommon and python-swsscommon and dependent packages in the base image (host enviroment) 3) Made iptables programmed in all namespace using ip netns exec Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fix Review Comments * Fix Comments * Added Change for Multi-asic to have iptables rules to accept internal docker tcp/udp traffic needed for syslog and redis-tcp connection. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fix Review Comments * Added more comments on logic. * Fixed all warning/errors reported by http://pep8online.com/ other than line > 80 characters. * Fix Comment Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Verified with swsscommon package. Fix issue for single asic platforms. * Moved to new python package * Address Review Comments. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments.	2020-08-20 15:11:42 -07:00
Tamer Ahmed	e484ae9dda	[services] Fix Delay Start of SNMP And Telemetry (#5211 ) SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-19 19:27:59 -07:00
Tamer Ahmed	dfc0617283	[interfaces] Reduce Calls to SONiC Cfggen (#5174 ) Calls to sonic-cfggen is CPU expensive. This PR reduces calls to sonic-cfggen to one call during startup when running interfaces- config. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-17 15:46:52 -07:00
lguohan	082c26a27d	[build]: combine feature and container feature table (#5081 ) 1. remove container feature table 2. do not generate feature entry if the feature is not included in the image 3. rename ENABLE_* to INCLUDE_* for better clarity 4. rename feature status to feature state 5. [submodule]: update sonic-utilities * 9700e45 2020-08-03 \| [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan] * c9d3550 2020-08-03 \| [tests]: fix drops_group_test failure on second run (#1023) [lguohan] * dfaae69 2020-08-03 \| [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran] * 216688e 2020-08-02 \| [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan] Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-08-05 13:23:12 -07:00
Renuka Manavalan	312771dc3e	[monit] Periodically monitor route consistency (#5085 ) * Add route_check to mont. * Switched to units of cycles per comments * Added comments per Joe's comments. * Added more comments per Royal's comments.	2020-08-04 10:33:13 -07:00
Joe LeVeque	3b89e5d467	[Python] Migrate applications/scripts to import sonic-py-common package (#5043 ) As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request: 1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added in https://github.com/Azure/sonic-buildimage/pull/5003 2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`. 3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module This is the next step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999 Also reverted my previous change in which device_info.get_platform() would first try obtaining the platform ID string from Config DB and fall back to gathering it from machine.conf upon failure because this function is called by sonic-cfggen before the data is in the DB, in which case, the db_connect() call will hang indefinitely, which was not the behavior I expected. As of now, the function will always reference machine.conf.	2020-08-03 11:43:12 -07:00
Tamer Ahmed	7872b4e196	[platform] Add Support For Environment Variable File (#5010 ) * [platform] Add Support For Environment Variable This PR adds the ability to read environment file from /etc/sonic. the file contains immutable SONiC config attributes such as platform, hwsku, version, device_type. The aim is to minimize calls being made into sonic-cfggen during boot time. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-07-31 17:59:09 -07:00
Joe LeVeque	b2344f6f78	[caclmgrd] Always restart service upon process termination (#5065 )	2020-07-29 10:12:38 -07:00
rkdevi27	26050ffef8	[baseimage]: /host unmount timeout issue during reboot. (#5032 ) Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.	2020-07-25 01:27:58 -07:00
Joe LeVeque	1587889b7a	[caclmgrd] remove default DROP rule on FORWARD chain (#5034 )	2020-07-24 11:59:46 -07:00
Joe LeVeque	43b5832e0c	[sudoers] Add `sonic-installer list` to read-only commands (#4996 ) `sonic-installer list` is a read-only command. Specify it as such in the sudoers file. This will also ensure the new `show boot` command, which calls `sudo sonic-installer list` under the hood doesn't fail due to permissions.	2020-07-20 11:23:05 -07:00
Joe LeVeque	d6925499f1	[caclmgrd] Filter DHCP packets based on dest port only (#4995 )	2020-07-17 11:16:19 -07:00
madhanmellanox	ade634090d	[caclmgrd] Log error message if IPv4 ACL table contains IPv6 rule and vice-versa (#4498 ) * Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table * Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled * Addressed code review comment to replace duplicate code with already existing functions * removed raising Exception when rule conflict in Control plane ACLs are found * added code to remove the rule_props if it is conflicting ACL table versioning rule * addressed review comment to add ignoring rule in the error statement Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-07-15 20:24:44 +03:00
rkdevi27	df740b3653	[baseimage]: /host unmount failed in VM during reboot (#4865 ) Added a check further to make the services to stop appropriately before unmount. Fix #4651	2020-07-14 15:34:19 -07:00
Sujin Kang	bf45e11d27	Add pcie-check service to check PCIe devices at boot (#4771 ) * PCIe Monitor service * Add rescan to pcie-mon.service when it fails to get all pcie devices * space * Clean up * review comments * update the pcie status in state db * update the failed pcie status once at the end * Update the pcie_status in STATE_DB and rename the service * Add log to exit the service if the configuration file doesn't exist. * fix the build failure * Redo the pcie rescan for pcie-check failed case. * review comments * review comments * review comments	2020-07-13 14:15:09 -07:00
Sujin Kang	b4452edb8a	Add disabling HW watchdog during boot for fast-reboot and warm-reboot (#4927 ) * Add disabling HW watchdog during boot for fast-reboot and warm-reboot case * typo	2020-07-12 18:08:52 +00:00
Joe LeVeque	2731571dc9	[caclmgrd] Improve code reuse (#4931 ) Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.	2020-07-12 18:08:52 +00:00
Venkatesan Mahalingam	7d003c3518	[TACACS+]: Add support to specify source address for TACACS+ (#4610 ) This pull request was cherry picked from "#1238" to resolve the conflicts. - Why I did it Add support to specify source address for TACACS+ - How I did it Add patches for libpam-tacplus and libnss-tacplus. The patches parse the new option 'src_ip' and store the converted addrinfo. Then the addrinfo is used for TACACS+ connection. Add a attribute 'src_ip' for table "TACPLUS\|global" in configDB Add some code to adapt to the attribute 'src_ip'. - How to verify it Config command for source address PR in sonic-utilities config tacacs src_ip <ip_address> - Description for the changelog Add patches to specify source address for the TACACS+ outgoing packets. - A picture of a cute animal (not mandatory but encouraged) UT logs: UT_tacacs_source_intf.txt	2020-07-12 18:08:51 +00:00
abdosi	fc6bcff52b	[sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838 ) * [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (in multi-npu). This change is triggered with issue found in multi-npu platforms where in docker namespace net.ipv6.conf.all.forwarding was 0 (should be 1) because of which RS/RA message were triggered and link-local router were learnt. Beside this there were some other sysctl.net.ipv6* params whose value in docker namespace is not same as host namespace. So to make we are always in sync in host and docker namespace created common file that list all sysctl.net.* params and used both by host and docker namespace. Any change will get applied to both namespace. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments and made sure to invoke augtool only one and do string concatenation of all set commands * Address Review Comments.	2020-07-12 18:08:51 +00:00
arlakshm	a8b99f77f3	syslog changes Multi ASIC platforms (#4738 ) Add changes for syslog support for containers running in namespaces on multi ASIC platforms. On Multi ASIC platforms Rsyslog service is only running on the host. There is no rsyslog service running in each namespace. On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address. The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-07-12 18:08:51 +00:00
Joe LeVeque	4d2d95e8e6	[hostcfgd] Synchronize all feature statuses once upon start (#4714 ) - Ensure all features (services) are in the configured state when hostcfgd starts - Better functionalization of code - Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword. This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.	2020-06-20 12:09:29 -07:00
padmanarayana	95e3cda5da	[DELL]: FTOS to SONiC fast conversion fixes (#4807 ) While migrating to SONiC 20181130, identified a couple of issues: 1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.	2020-06-19 11:02:08 -07:00
Joe LeVeque	6960477cc2	[caclmgrd] Don't limit connection tracking to TCP (#4796 ) Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.	2020-06-18 00:18:20 -07:00
xumia	76a395cdbf	[secure boot] Support rw files allowlist (#4585 ) * Support rw files allowlist for Sonic Secure Boot * Improve the performance * fix bug * Move the config description into a md file * Change to use a simple way to remove the blank line * Support chmod a-x in rw folder * Change function name * Change some unnecessary words	2020-06-13 00:10:13 -07:00
Ying Xie	ae7bf3db52	[ntp] disable ntp long jump (#4748 ) Found another syncd timing issue related to clock going backwards. To be safe disable the ntp long jump. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-06-11 13:01:21 -07:00
Joe LeVeque	7b8037770d	[caclmgrd] Get first VLAN host IP address via next() (#4685 ) I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web. This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.	2020-06-02 02:11:21 -07:00
Joe LeVeque	eff8a89523	[hostcfgd] Get service enable/disable feature working (#4676 ) Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here: 1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop 2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots 3. Substitute returns with continues so that even if one service fails, we still try to handle the others Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.	2020-06-02 02:07:22 -07:00
taocy	ea2dd9541d	change image apt source list from stretch to buster for arm	2020-05-25 13:15:19 +00:00
Joe LeVeque	bce42a7595	[caclmgrd] Allow more ICMP types (#4625 )	2020-05-20 17:45:07 -07:00
abdosi	a44fc07e78	Changes to support config-setup service for multi-npu (#4609 ) * Changes to support config-setup service for multi-npu platforms. For Multi-npu we are not supporting as of now config initializtion and ZTP. It will support creating config db from minigraph or using config db from previous file system Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments. * Address Review comments * Address Review Comments of using pyhton based config load_minigraph/ config save/config reload from shell scripts so that we don't duplicate code. Also while running from shell we will skip stop/start services done by those commands. * Updated to use python command so no code duplication.	2020-05-20 16:32:33 -07:00
rkdevi27	32f58b5864	Fix "/host unmount failure" during reboot (#4558 )	2020-05-20 11:18:11 -07:00
Ying Xie	cdfb1ced44	[ntp] enable/disable NTP long jump according to reboot type (#4577 ) * [ntp] enable/disable NTP long jump according to reboot type - Enable NTP long jump after cold reboot. - Disable NTP long jump after warrm/fast reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * fix typo * further refactoring * use sonic-db-cli instead	2020-05-20 10:57:21 -07:00
Joe LeVeque	5150e7b655	[caclmgrd] Ignore keys in interface-related tables if no IP prefix is present (#4581 ) Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.	2020-05-12 18:16:55 -07:00
abdosi	5fe2216ea3	Fix for issue where image is compile with flag ENABLE_DHCP_GRAPH_SERVICE (#4573 ) and then we load image and reboot even if there was existing config_db.json we will look for DHCP Service. we should disbale update_graph in such cases. This behaviour is silimar to what we have in 201811 image.	2020-05-12 14:49:56 -07:00
Joe LeVeque	5e8e0d76fc	[caclmgrd] Add some default ACCEPT rules and lastly drop all incoming packets (#4412 ) Modified caclmgrd behavior to enhance control plane security as follows: Upon starting or receiving notification of ACL table/rule changes in Config DB: 1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions 2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute 3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute 4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages 5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets 6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets 7. Add iptables/ip6tables commands to allow all incoming BGP traffic 8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP) 9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured) 10. Add iptables rules to drop all packets destined for loopback interface IP addresses 11. Add iptables rules to drop all packets destined for management interface IP addresses 12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses 13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses 14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute) 15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets	2020-05-11 12:36:47 -07:00
Joe LeVeque	dfdd94d8ad	[process-reboot-cause] If software reboot cause is unknown add note if first boot into new image (#4538 )	2020-05-06 22:48:33 -07:00
wangshengjun	bed4a799df	[ebtables]add the filter rule for ARP packets with vlan tag: (#3945 ) 1. ebtables -t filter -A FORWARD -p 802_1Q --vlan-encap 0806 -j DROP The ARP packet with vlan tag can't match the default rule. Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>	2020-05-06 20:03:09 -07:00
Dong Zhang	340cf826a6	[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541 )	2020-05-06 15:41:28 -07:00
rkdevi27	4511216789	Ssd mitigation changes (#4214 ) * ssd_mitigation_changes * ssd_mitigation_changes * ssd_mitigation_changes * ssd_mitigation_changes	2020-04-30 22:58:09 -07:00
Olivier Singla	799f22d4c7	[baseimage]: Run fsck filesystem check support prior mounting filesystem (#4431 ) * Run fsck filesystem check support prior mounting filesystem If the filesystem become non clean ("dirty"), SONiC does not run fsck to repair and mark it as clean again. This patch adds the functionality to run fsck on each boot, prior to the filesystem being mounted. This allows the filesystem to be repaired if needed. Note that if the filesystem is maked as clean, fsck does nothing and simply return so this is perfectly fine to call fsck every time prior to mount the filesystem. How to verify this patch (using bash): Using an image without this patch: Make the filesystem "dirty" (not clean) [we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform] [do this only on a test platform!] dd if=/dev/sda3 of=superblock bs=1 count=2048 printf "$(printf '\\x%02X' 2)" \| dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null dd of=/dev/sda3 if=superblock bs=1 count=2048 Verify that filesystem is not clean tune2fs -l /dev/sda3 \| grep "Filesystem state:" reboot and verify that the filesystem is still not clean Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean. fsck log is stored on syslog, using the string FSCK as markup.	2020-04-30 00:33:20 -07:00
pavel-shirshov	057ced0391	[bgpcfgd]: Split one bgp mega-template to chunks. (#4143 ) The one big bgp configuration template was splitted into chunks. Currently we have three types of bgp neighbor peers: general bgp peers. They are represented by CONFIG_DB::BGP_NEIGHBOR table entries dynamic bgp peers. They are represented by CONFIG_DB::BGP_PEER_RANGE table entries monitors bgp peers. They are represented by CONFIG_DB::BGP_MONITORS table entries This PR introduces three templates for each peer type: bgp policies: represent policieas that will be applied to the bgp peer-group (ip prefix-lists, route-maps, etc) bgp peer-group: represent bgp peer group which has common configuration for the bgp peer type and uses bgp routing policy from the previous item bgp peer-group instance: represent bgp configuration, which will be used to instatiate a bgp peer-group for the bgp peer-type. Usually this one is simple, consist of the referral to the bgp peer-group, bgp peer description and bgp peer ip address. This PR redefined constant.yml file. Now this file has a setting for to use or don't use bgp_neighbor metadata. This file has more parameters for now, which are not used. They will be used in the next iteration of bgpcfgd. Currently all tests have been disabled. I'm going to create next PR with the tests right after this PR is merged. I'm going to introduce better bgpcfgd in a short time. It will include support of dynamic changes for the templates. FIX:: #4231	2020-04-23 09:42:22 -07:00
bsun-sudo	012c832ce5	[ntp] add ntp support in buster with mgmt vrf (#55 ) - create a file in files/image_config/ntp/ntp-systemd-wrapper to add mgmt vrf related start cmd for ntp service. So that the default /usr/lib/ntp/ntp-systemd-wrapper can be overriden during build time. - modify build_debian.sh to cp files/image_config/ntp/ntp-systemd-wrapper to /usr/lib/ntp/ntp-systemd-wrapper during build time. Co-authored-by: Bing Sun <Bing_Sun@dell.com>	2020-04-17 04:51:51 +00:00
bsun-sudo	2a237c57e6	[mgmt-vrf]: mgmt vrf related change for Buster (#53 ) Co-authored-by: Bing Sun <Bing_Sun@dell.com>	2020-04-17 04:51:51 +00:00
Stephen Sun	aec51c8aaf	[docker-wait-any] Use APIClient instead of Client according to API update due to the upgrade from docker-py (1.6.0) to docker (4.1.0)	2020-04-17 04:51:51 +00:00
Guohan Lu	e479a56db3	[baseimage]: setup ebtables.service in buster image ebtables is not enabled by default in buster Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-04-17 04:51:51 +00:00
Guohan Lu	fb12b0a621	[baseimage]: various fixes due to buster changes do mount/umount in the chroot environment install cron explicitly install rasdaemon as a replacement for mcelog switch python package docker-py to docker	2020-04-17 04:51:51 +00:00
Renuka Manavalan	f128153706	[baseimage]: Install Kubernetes packages if enabled in image (#4374 ) * Install kubernetes worker node packages, if enabled. * Minor updates * Added some comments * Updates per review comments. Built a private image to test to work fine. * Remove the removed file. * Update per comments Make a fix, as kubeadm no demands a higher version of kubelet & kubectl. As kubeadm auto install kubectl & kubelet, removing explicit install is an easier/robust fix. * Changes per review comments. * Updates per comments. 1) Dropped helper & pod scripts 2) Made install verbose * Drop creation of pods subdir, as this PR does not use them. * From comments to 'n' per review comments. * 1) kubeadm.conf is created as part of kubeadm package install. Hence dropped explicit copy.	2020-04-13 08:41:18 -07:00
rajendra-dendukuri	de377ebccd	Fix typo in config-setup service (#4388 )	2020-04-07 23:44:50 -07:00
SuvarnaMeenakshi	4b8067e913	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-03-31 10:06:19 -07:00
Garrick He	d095d0bdbc	[procdockerstatsd] Fix CMD field in dB (#4335 ) * Fix the CMD for the PROCESSSTATS entries so that there is a space between the command name and the arguments. Signed-off-by: Garrick He <garrick_he@dell.com>	2020-03-28 11:43:48 -07:00
lguohan	3c6f23e7b7	[tacacs]: fix /etc/nsswitch.conf for buster image (#4303 ) in buster image, default /etc/nsswitch.conf becomes ``` passwd: files ``` when tacacs is enable, this files changes to ``` passwd: tacplus files ```	2020-03-22 09:44:48 -07:00
SuvarnaMeenakshi	cfe754f665	[ntp]: Add "tinker panic 0" in ntp.conf to avoid ntpd from panic (#4263 ) - What I did Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large. - How I did it Added "tinker panic 0" in ntp.conf file. - How to verify it [this assumes that there is a valid NTP server IP in config_db/ntp.conf] Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s. Reboot the device. Before the fix: 3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time. After the fix: 3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.	2020-03-21 18:50:12 -07:00
arheneus@marvell.com	94162679bb	[sonic-cfggen] MGMT Interface configuration (#4280 ) update network and broadcast address in /etc/network/interfaces Before: root@sonic:/home/admin# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.28.32.73 netmask 255.255.254.0 broadcast 0.0.0.0 <<<<< After: root@sonic:~# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.28.32.73 netmask 255.255.254.0 broadcast 10.28.33.255 <<<<< Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-03-21 14:25:19 -07:00
yozhao101	560fd50262	[Monit] Delay start of monitoring for 5 minutes (#4281 )	2020-03-19 14:14:47 -07:00
Stepan Blyshchak	1ef740361c	[docker_image_ctl.j2] Share UTS namespace with host OS (#4169 ) Instead of updating hostname manualy on Config DB hostname change, simply share containers UTS namespace with host OS. Ideally, instead of setting `--uts=host` for every container in SONiC, this setting can be set per container if feature requires. One behaviour change is introduced in this commit, when `--privileged` or `--cap-add=CAP_SYS_ADMIN` and `--uts=host` are combined, container has privilege to change host OS and every other container hostname. Such privilege should be fixed by limiting containers capabilities. Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2020-02-26 10:56:54 +02:00
Santhosh Kumar T	2626565afb	[DellEMC] S6100 Last Reboot Reason Thermal Support (#3767 )	2020-02-18 00:02:33 -08:00
Joe LeVeque	8126916b46	[interfaces-config.sh] Do not bring 'lo' interface down and up (#4150 )	2020-02-14 14:55:03 -08:00
Stephen Sun	af44856d5c	[process-reboot-cause]Clean up the process-reboot-cause as reqired in issue 3927 (#4128 )	2020-02-11 09:54:12 -08:00
pra-moh	ab1a945cb9	[procdockerstatsd] Fix incorrect case issue in service file (#4134 )	2020-02-10 11:08:42 -08:00
pra-moh	4338fbe12b	[procdockerstats]: Update file permission for procdockerstatsd (#4126 )	2020-02-07 07:46:29 -08:00
Kiran Kumar Kella	97165a0d69	Changes in sonic-buildimage to support the NAT feature (#3494 ) * Changes in sonic-buildimage for the NAT feature - Docker for NAT - installing the required tools iptables and conntrack for nat Signed-off-by: kiran.kella@broadcom.com * Add redis-tools dependencies in the docker nat compilation * Addressed review comments * add natsyncd to warm-boot finalizer list * addressed review comments * using swsscommon.DBConnector instead of swsssdk.SonicV2Connector * Enable NAT application in docker-sonic-vs	2020-01-29 17:40:43 -08:00
kannankvs	7cb63008d7	mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from con… (#4057 ) * mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from confiDB to snmpd.conf without using snmp.yml * added a missing if condition	2020-01-28 17:41:21 -08:00
SuvarnaMeenakshi	c9483796dc	[baseimage]: support building multi-asic component (#3856 ) - move single instance services into their own folder - generate Systemd templates for any multi-instance service files in slave.mk - detect single or multi-instance platform in systemd-sonic-generator based on asic.conf platform specific file. - update container hostname after creation instead of during creation (docker_image_ctl) - run Docker containers in a network namespace if specified - add a service to create a simulated multi-ASIC topology on the virtual switch platform Signed-off-by: Lawrence Lee <t-lale@microsoft.com> Signed-off-by: Suvarna Meenakshi <Suvarna.Meenaksh@microsoft.com>	2020-01-26 13:56:42 -08:00
pra-moh	e3475b81d7	[baseimage]: removing space from shebang in procdockerstatsd (#4051 )	2020-01-23 17:49:41 -08:00
Dong Zhang	7aa0baf709	[MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector (#4035 ) * [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * update comment for a potential bug * update comment * add TODO maker as review reqirement	2020-01-22 11:26:23 -08:00
Howard Persh	44fa5efe00	[startup] Fixes issue with /var/platform directory not created (#4000 )	2020-01-22 10:02:28 -08:00
Joe LeVeque	aca1a86856	[caclmgrd] Fix application of IPv6 service ACL rules (part 2) (#4036 )	2020-01-17 17:33:31 -08:00
kannankvs	d150721fa1	modified down rules to pre-down rules to ensure that default route is… (#3853 ) * modified down rules to pre-down rules to ensure that default route is deleted just before interface is made down	2020-01-16 19:36:49 -08:00
yozhao101	aa67921d06	[Monit] Change the monitoring period from 120 seconds to 60 seconds. (#3974 ) * [Monit] Change the monitoring period of monit from 120 seconds to 60 seconds and also at the same time double the interval for existing sonic monit config file in host. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-01-10 13:01:24 -08:00
Sujin Kang	856b4b64eb	[reboot cause]: Delay process-reboot-cause service until network connection is stable (#4003 )	2020-01-10 09:47:13 -08:00
lguohan	483a5946a8	Revert "[MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928 )" (#4002 ) This reverts commit `0dae59ac30`.	2020-01-10 08:27:34 -08:00
Dong Zhang	0dae59ac30	[MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928 ) * [MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector * fix vs tests along with swss vs tests together	2020-01-02 14:46:25 -08:00
Joe LeVeque	24a0c46464	[monit] Build from source and patch to use MemAvailable value if available on system (#3875 )	2019-12-30 18:25:57 -08:00
Renuka Manavalan	78db0804d3	corefile uploader: Updates per review comments offline (#3915 ) * Updates per review comments 1) core_uploader service waits for syslog.service 2) core_uploader service enabled for restart on failure 3) Use mtime instead of file size + ample time to be robust. * Avoid reloading already uploaded file, by marking the names with a prefix. * Updated failing path. 1) If rc file is missing or required data missing, it periodically logs error in forever loop. 2) If upload fails, retry every hour with a error log, forever. * Fix few bugs * The binary update_json.py will come from sonic-utilities.	2019-12-30 13:01:03 -08:00
Prabhu Sreenivasan	87f70108cb	SONiC Management Framework Release 1.0 (#3488 ) * Added sonic-mgmt-framework as submodule / docker * fix build issues * update sonic-mgmt-framework submodule branch to master * Merged changes 70007e6d2ba3a4c0b371cd693ccc63e0a8906e77..00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 Modified environemnt variables * Changes to build sonic-mgmt-framework docker * bumped up sonic-mgmt-framework commit-id * version bump for sonic-mgmt-framework commit-it * bumped up sonic-mgmt-framework commit-id * Add python packages to docker * Build fix for docker with python packages * added libyang as dependent package * Allow building images on NFS-mounted clones Prior to this change, `build_debian.sh` would generate a Debian filesystem in `./fsroot`. This needs root permissions, and one of the tests that is performed is whether the user can create a character special file in the filesystem (using mknod). On most NFS deployments, `root` is the least privileged user, and cannot run mknod. Also, attempting to run commands like rm or mv as root would fail due to permission errors, since the root user gets mapped to an unprivileged user like `nobody`. This commit changes the location of the Debian filesystem to `/fsroot`, which is a tmpfs mount within the slave Docker. The default squashfs, docker tarball and zip files are also created within /tmp, before being copied back to /sonic as the regular user. The side effect of this change is that the contents of `/fsroot` are no longer available once the slave container exits, however they are available within the squashfs image. Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com> * bumped up sonc-mgmt-framework commit to include PR #18 * REST Server startup script is enahnced to read the settings from ConfigDB. Below table provides mapping of db field to command line argument name. ============================================================ ConfigDB entry key Field name REST Server argument ============================================================ REST_SERVER\|default port -port REST_SERVER\|default client_auth -client_auth REST_SERVER\|default log_level -v DEVICE_METADATA\|x509 server_crt -cert DEVICE_METADATA\|x509 server_key -key DEVICE_METADATA\|x509 ca_crt -cacert ============================================================ * Replace src/telemetry as submodule to sonic-telemetry * Update telemetry commit HEAD * Update sonic-telemetry commit HEAD * libyang env path update * Add libyang dependency to telemetry * Add scripts to create JSON files for CLI backend Scripts to create /var/platform/syseeprom and /var/platform/system, which are back-end files for CLI, for system EEPROM and system information. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * In startup script, create directory where CLI back-end files live Signed-off-by: Howard Persh <Howard_Persh@dell.com> * build dependency pkgs added to docker for build failure fix * Changes to fix build issue for mgmt framework * Fix exec path issue with telemetry * s5232[device] PSU detecttion and default led state support * Processing of first boot in rc.local should not have premature exit Signed-off-by: Howard Persh <Howard_Persh@dell.com> * docker mount options added for platform, system features * bumped up sonic-mgmt-framework commit id to pick 23rd July 2019 changes * Added mount options for telemetry docker to get access for system and platform info. * Update commit for sonic-utilities * [dell]: Corrected dport map and renamed config files for S5232F * Fix telemetry submodule commit * added support for sonic-cli console * [Dell S5232F, Z9264F] Harden FPGA driver kernel module For Dell S5232F and Z9264F platforms, be more strict when checking state in ISR of FPGA driver, to harden against spurious interrupts. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * update mgmt-framework submodule to 27th Aug commit. * remove changes not related to mgmt-framework and sonic-telemetry * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * make submodule changes and remove a change not related to PR * more changes * Update .gitmodules * Update Dockerfile.j2 * Update .gitmodules * Update .gitmodules * Update .gitmodules reverting experimental change * Removed syspoll for release_1.0 Signed-off-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> * Update docker-sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Revert "Processing of first boot in rc.local should not have premature exit" This reverts commit `e99a91ffc2`. * Remove old telemetry directory * Update docker-sonic-mgmt-framework.mk * Resolving merge conflict with Azure * Reverting the wrong merge * Use CVL_SCHEMA_PATH instead of changing directory for telemetry startup * Add missing export * Add python mmh3 to slave dockerfile * Remove sonic-mgmt-framework build dep for telemetry, fix dialout startup issues * Provided flag to disable compiling mgmt-framework * Update sonic-utilites point latest commit id * Point sonic-utilities to Azure accepted SHA * Updating mgmt framework to right sha * Add sonic-telemetry submodule * Update the mgmt-framework commit id Co-authored-by: jghalam <joe.ghalam@gmail.com> Co-authored-by: Partha Dutta <51353699+dutta-partha@users.noreply.github.com> Co-authored-by: srideepDell <srideep_devireddy@dell.com> Co-authored-by: nirenjan <nirenjan@users.noreply.github.com> Co-authored-by: Sachin Holla <51310506+sachinholla@users.noreply.github.com> Co-authored-by: Eric Seifert <seiferteric@gmail.com> Co-authored-by: Howard Persh <hpersh@yahoo.com> Co-authored-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> Co-authored-by: Arunsundar Kannan <31632515+arunsundark@users.noreply.github.com> Co-authored-by: rvasanthm <51932293+rvasanthm@users.noreply.github.com> Co-authored-by: Ashok Daparthi-Dell <Ashok_Daparthi@Dell.com> Co-authored-by: anand-kumar-subramanian <51383315+anand-kumar-subramanian@users.noreply.github.com>	2019-12-23 21:47:16 -08:00
Joe LeVeque	77d636256b	[caclmgrd] Fix application of IPv6 service ACL rules (#3917 )	2019-12-19 07:15:27 -08:00
Renuka Manavalan	3ab4b71656	Corefile uploader service (#3887 ) * Corefile uploader service 1) A service is added to watch /var/core and upload to Azure storage 2) The service is disabled on boot. One may enable explicitly. 3) The .rc file to be updated with acct credentials and http proxy to use. 4) If service is enabled with no credentials, it would sleep, with periodic log messages 5) For any update in .rc, the service has to be restarted to take effect. * Remove rw permission for .rc file for group & others. * Changes per review comments. Re-ordered .rc file per JSON.dump order. Added a script to enable partial update of .rc, which HWProxy would use to add acct key. * Azure storage upload requires python module futures, hence added it to install list. * Removed trailing spaces. * A mistake in name corrected. Copy the .rc updater script to /usr/bin.	2019-12-15 16:48:48 -08:00
Stephen Sun	80bb7fd15a	[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot (#3880 ) * [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown. * [process-reboot-cause]Fix review comments * [process-reboot-cause]address comments 1. use "with" statement 2. update fast/warm reboot BOOT_ARG * [process-reboot-cause]address comments * refactor the code flow * Remove escape * Remove extra ':'	2019-12-14 09:41:48 -08:00
Ying Xie	eefa8455d7	[hostcfgd] avoid in place editing config file contents (#3904 ) In place editing (sed -i) seems having some issues with filesystem interaction. It could leave 0 size file or corrupted file behind. It would be safer to sed the file contents into a new file and switch new file with the old file. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-13 19:26:39 -08:00
rajendra-dendukuri	fec80293dd	ZTP infrastructure changes to support DHCP discovery provisioning data (#3298 ) * ZTP infrastructure changes to support DHCP discovery provisioning data - Dynamically generate DHCP client configuration based on current ZTP state - Added support to request and process hostname when using DHCPv6 - Do not process graphservice url dhcp option if ZTP is enabled, ZTP service will process it - Generate /e/n/i file with all active interfaces seeking address assignment via DHCP. Only interfaces that are created in Linux will be added to /e/n/i. Also DHCP is started only on linked up in-band interfaces. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2019-12-10 08:16:56 -08:00
rajendra-dendukuri	cda61290ac	[config-setup]: create a SONiC configuration management service (#3227 ) * Create a SONiC configuration management service * Perform config db migration after loading config_db.json to redis DB * Migrate config-setup post migration hooks on image upgrade config-setup post migration hooks help user to migrate configurations from old image to new image. If the installed hooks are user defined they will not be part of the newly installed image. So these hooks have to be migrated to new image and only then they can be executing when the new image is booting. The changes in this fix migrate config-setup post-migration hooks and ensure that any hooks with the same filename in newly installed image are not overwritten. It is expected that users install new hooks as per their requirement and not edit existing hooks. Any changes to existing hooks need to be done as part of new image and not post bootup.	2019-12-04 07:15:58 -08:00
pra-moh	bfa96bbce3	Add daemon which periodically pushes process and docker stats to State DB (#3525 )	2019-11-27 15:35:41 -08:00
pra-moh	d3a1555f30	[hostcfgd] Add support to enable/disable optional features (#3653 )	2019-11-26 14:11:12 -08:00
kannankvs	4007d9ba9c	[ntp]: modified ntp script to hide the error related to cfggen (#3745 ) This PR is to handle the issue 3527. When device boots up, NTP throws a traceback as explained in the issue 3527. - Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”. - Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback. - This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.	2019-11-14 00:06:54 -08:00
Joe LeVeque	c50c390eb4	[rsyslog] Add support for IPv6 remote addresses (#3754 )	2019-11-14 00:00:55 -08:00
Tyler Li	c07ae3b16f	Loopback ip addresses move to intfmgrd for supporting VRF	2019-11-10 02:27:33 -08:00
Joe LeVeque	85b0de3df1	[docker-syncd]: Restart SwSS, syncd and dependent services if a critical process in syncd container exits unexpectedly (#3534 ) Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.	2019-11-09 10:26:39 -08:00
lguohan	6d46badbdc	[aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716 )	2019-11-06 20:18:31 -08:00
Neetha John	95466c3ab7	[pfcwd]: Do not start pfc watchdog on Management Tor (#3719 ) Signed-off-by: Neetha John <nejo@microsoft.com>	2019-11-06 18:51:02 -08:00
pavel-shirshov	d5af096f41	[TSA]: Add community to the loopback prefix, when isolated (#3708 ) * Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml * Fix bgp templates * Add community for loopback when bgpd is isolated * Use correct community value	2019-11-06 16:07:28 -08:00
Ying Xie	5961e031e1	[hostname-config] improve hostname-config process (#3676 ) We noticed in tests/production that there is a low probability failure where /etc/hosts could have some garbage characters before the entry for local host name. The consequence is that all sudo command would be very slow. In extreme cases it would prevent some services from starting properly. I suspect that the /etc/hosts file might be opened by some process causing the issue. Editing contents with new file level and replace the whole file should be safer. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-29 08:30:27 -07:00
Danny Allen	63328814fc	[core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659 ) Signed-off-by: Danny Allen <daall@microsoft.com>	2019-10-23 15:55:47 -07:00
pavel-shirshov	9b8f5c9c9a	[ntp]: Use loopback address when we don't have MGMT interface (#3566 ) Added configuration to use Loopback ip if a switch doesn't have MGMT_PORT.	2019-10-07 07:49:25 -07:00
Ying Xie	cd85e2148b	[updategraph] enhance update graph handling (#3549 ) - after reloading minigraph, write latest version string in the DB. - if old config_db.json file exists, use it and migrate to latest version. - only reload minigraph when config_db.json doesn't exist and minigraph exists. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-02 13:58:44 -07:00
Ying Xie	d5262a3621	[first boot] sync file system after moving/copying files (#3550 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-02 13:58:34 -07:00
Long Ou	b6a09999de	[hostcfgd] hostcfgd will exit when set hostname in DEVICE_METADATA (#3394 ) Signed-off-by: ouxiaolong <ouxiaolong@asterfusion.com>	2019-09-24 17:36:02 -07:00
Harish Venkatraman	9d2d617264	[SNMP] management VRF SNMP support (#2608 ) * [SNMP] management VRF SNMP support This commit adds SNMP support for Management VRF using l3mdev. The patch included provides VRF support, there is no single "listendevice" configuration, rather multiple agentaddress config options can each have their own "interface" to bind to using "ip%interface". The snmpd.conf file is accordingly generated using the snmp.yml file and redis database info. Adding below the comments of SNMP patch 1376 -------------------------------------------- Since the Linux kernel added support for Virtual Routing and Forwarding (VRF) in version 4.3 (Note: these won't compile on non-linux platforms) https://www.kernel.org/doc/Documentation/networking/vrf.txt Linux users could not use snmpd in its current form to bind specific listening IP addresses to specific VRF devices. A simplified description of a VRF inteface is an interface that is a master (a container of sorts) that collects a set of physicalinterfaces to form a routing table. This set of two patches (one for V5-7-patches and one for V5-8-patches branches) is almost identical to patch single "listendevice" configuration. Rather, multiple agentAddress config options can each have their own "interface" to bind to using the <ip>%<interface> syntax.</interface></ip> ------------------------------------------- Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>	2019-09-18 17:26:45 -07:00
Prince Sunny	8ca1eb289e	Install Iptables rules to set TCPMSS for 'lo' interface (#3452 ) * Install Iptables rules to set TCPMSS for lo interface * Moved implementation to hostcfgd to maintain at one place	2019-09-18 10:12:28 -07:00
sridhar-ravindran	3c0b56a709	[DELL] S6100 Support PowerCycle in Last Reboot Reason (#3403 ) * [DELL] S6100 Support PowerCycle in Last Reboot Reason * handle first time boot properly * S6000 Last Reboot Reason Fix	2019-09-17 16:51:46 -07:00
Harish Venkatraman	31d1a76197	[baseimage]: Management vrf ntp support (#3204 ) This commit adds NTP support for management VRF using L3mdev. Config vrf add mgmt will enable management VRF, enslave the eth0 device to the master device mgmt, stop ntp service in default, restart interfaces-configs and restart ntp service in mgmt-vrf context. Requirement and design are covered in mgmt vrf design document. Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>	2019-09-16 10:21:06 -07:00
Danny Allen	97c675c6d5	[cron.d] Add cron job to periodically clean-up core files (#3449 ) * [cron.d] Create cron job to periodically clean-up core files * Create script to scan /var/core and clean-up older core files * Create cron job to run clean-up script Signed-off-by: Danny Allen <daall@microsoft.com> * Update interval for running cron job * Respond to feedback * Change syslog id	2019-09-13 10:50:31 -07:00
lguohan	95a72b4e39	[baseimage]: fix monit configuration (#3448 ) - monit config broke by one monit upgrade - abandon sed approach since it is suspestible to monit config changes - use unixsocket instead of httpd due to a bug in 5.20.0	2019-09-12 22:48:40 -07:00
Joe LeVeque	a27f12773b	[baseimage]: Log message containing SONiC version to syslog at boot (#3416 )	2019-09-09 14:18:23 -07:00
Ying Xie	d6b4223bdd	[control plane assistant] stop control plane assistant after warm reboot (#3337 ) Delay saving configuration so that the control assistant configurations won't be persisted. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-08-15 00:45:54 -07:00
Renuka Manavalan	fcdf62f5f6	Fix to ensure that tacacs servers are ordered (reverse) by priority in pam.d's config. (#3322 ) Present: Servers are listed in the same order as in redis-db Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf. As well convert priority to integer for use by sort.	2019-08-09 11:46:46 -07:00
arheneus@marvell.com	50fe458592	[build]: SONiC buildimage ARM arch support (#2980 ) ARM Architecture support in SONIC make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH] SONIC_ARCH: default amd64 armhf - arm32bit arm64 - arm64bit Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2019-07-25 22:06:41 -07:00
Harish Venkatraman	3e69427ac0	[baseimage] management VRF support via l3mdev (#2585 ) This commit adds support for New feature management VRF using L3mdev. Added commands to enable/disable management VRF. Config vrf add mgmt will enable management VRF, enslave the eth0 device to the master device mgmt and restart interfaces-configs in mgmt-vrf context. management interface (eth0) can be configured using config interface eth0 ip add command and removed using config interface eth0 ip remove command. Requirement and design are covered in mgmt vrf design document. Currently show command displays linux command output; will update show command display in next PR after concluding what would be the output for the show commands. Added metric for default routes in dhcp and static, any changes for metric will be addressed subsequently after discussing. Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>	2019-07-24 16:18:40 -07:00
Ying Xie	9d64ce761f	[warm reboot] save configuration after warm reboot (#3200 ) * [warm reboot] save configuration after warm reboot After warm reboot, save a copy of in memory database to config_db.json, upgrade procedure might have removed config_db.json to force new image to reload minigraph. However, reload minigraph is skipped during warm reboot. Missing config_db.json would cause device to fault in next non-upgrading cold/fast reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * Update finalize-warmboot.sh	2019-07-24 09:59:47 -07:00
zzhiyuan	e4c041b57f	[baseimage]: Fix process-reboot-cause possibly throwing OSError (#3159 ) In case of going from previous iteration of SONiC, and the last reboot was hardware, REBOOT_CAUSE_FILE may not be present and the service may throw an error.	2019-07-16 08:34:11 -07:00
Joe LeVeque	5e2ab9dd03	[process-reboot-cause] Handle case if platform does not yet have sonic_platform implementation (#3126 )	2019-07-05 17:53:49 -07:00
Joe LeVeque	e5a2beb13b	[reboot-cause]: Move reboot cause processing to its own service, 'process-reboot-cause' (#3102 )	2019-07-03 10:38:20 -07:00
Joe LeVeque	319d854e46	[baseimage]: Increase TMOUT for serial port connections to 15 minutes (#3032 ) Increase TMOUT value in order to close inactive serial console connections after 900 seconds (15 minutes) of inactivity	2019-06-19 00:16:01 -07:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
Myron Sosyak	3ec95e17c8	[build_templates] [hostcfgd] Keep containers hostname up to date (#2924 ) * Add updateHostName function to docker_image_ctl.j2 * Add hostname specification on container creating step * Add listener for hostname changes in hostcfgd Signed-off-by: Myron Sosyak <msosyak@barefootnetworks.com>	2019-06-06 00:41:30 -07:00
Joe LeVeque	3ec3e20e5a	[logrotate] Enhance robustness (#2942 ) * [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes * [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround However, continue to send SIGHUP directly to rsyslogd process because 'service rsyslog rotate' still doesn't work properly with init-system-helpers version 1.48	2019-05-25 18:00:18 -07:00
Ying Xie	222706120d	[updategraph] set DB version after minigraph reload (#2917 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-05-18 22:08:41 -07:00
Renuka Manavalan	a357693f52	[tacacs]: skip accessing tacacs servers for local non-tacacs users (#2843 ) * Switch the nss look up order as "compat" followed by "tacplus". This helps use the legacy passwd file for user info and go to tacacs only if not found. This means, we never contact tacacs for local users like "admin". This isolates local users from any issues with tacacs servers. W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable. * Skip tacacs server access for local non-tacacs users. Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus access is required for all tacacs users, who also get created locally.	2019-05-09 14:36:32 -07:00
Ying Xie	9efcf1759a	[ebtables] install ebtables in base image and install filter rules (#2805 ) - Add ebtables package, and install some filter rules: 1. ebtables -A FORWARD -d BGA -j DROP 2. ebtables -A FORWARD -p ARP -j DROP Basically, we let the ARP packets in the VLAN being forwarded by the ASIC, kernel gets a copy of these ARP packets and the forwarding from Kenerl gets dropped. So there is always only one copy of ARP/response in the VLAN. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-05-09 09:44:41 -07:00
Joe LeVeque	6eca27e564	[services] Restart SwSS service upon unexpected critical process exit (#2845 ) * [service] Restart SwSS Docker container if orchagent exits unexpectedly * Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes * Move supervisor-proc-exit-listener script * [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB * Ensure dependent services stop/start/restart with SwSS * Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230) * Also update journald.conf options * Remove 'PartOf' option from unit files * Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile * Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container * Update critical_processes file for swss container	2019-05-01 08:02:38 -07:00
Joe LeVeque	2736da97c7	[sudoers] Add /usr/bin/teamshow to READ_ONLY_CMDS (#2846 )	2019-05-01 08:01:44 -07:00
zhenggen-xu	75964ef243	[baseimage]: Add fstrim service and fstrim timer by default (#2804 ) This service (weekly) will let SSD firmware to do the garbage collection after file-system deleted files. It could avoid slowness or even READ-ONLY error due to SSD not being able to free the pages even though the file system thinks there was a lot of space left. Signed-off-by: Zhenggen Xu <zxu@linkedin.com>	2019-04-21 14:21:16 -07:00

... 2 3 4 5 6 ...

500 Commits