sonic-buildimage

Author	SHA1	Message	Date
Prince Sunny	9ad1971c78	Enable restapi, update sonic-restapi (#5169 ) * Enable restapi if included in image * [Submodule update] sonic-restapi	2020-08-14 09:22:09 -07:00
Joe LeVeque	309a098b21	[201911][Python] Migrate applications/scripts to import sonic-py-common package (#5132 ) As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request: 1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added to the 201911 branch in https://github.com/Azure/sonic-buildimage/pull/5063 2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`. 3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module This is a step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999	2020-08-13 16:35:53 -07:00
Abhishek Dosi	b19dab4ba5	As part of commit `78c803851c` that cherry-pick PR#5081 ICCPD feature got include din 201911 which is not needed so removed it back Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-08-09 12:00:09 -07:00
lguohan	78c803851c	[build]: combine feature and container feature table (#5081 ) 1. remove container feature table 2. do not generate feature entry if the feature is not included in the image 3. rename ENABLE_* to INCLUDE_* for better clarity 4. rename feature status to feature state 5. [submodule]: update sonic-utilities * 9700e45 2020-08-03 \| [show/config]: combine feature and container feature cli (#1015) (HEAD, origin/master, origin/HEAD) [lguohan] * c9d3550 2020-08-03 \| [tests]: fix drops_group_test failure on second run (#1023) [lguohan] * dfaae69 2020-08-03 \| [lldpshow]: Fix input device is not a TTY error (#1016) [Arun Saravanan Balachandran] * 216688e 2020-08-02 \| [tests]: rename sonic-utilitie-tests to tests (#1022) [lguohan] Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-08-09 11:55:40 -07:00
Sujin Kang	ff6cb6c402	Add disabling HW watchdog during boot for fast-reboot and warm-reboot (#4927 ) * Add disabling HW watchdog during boot for fast-reboot and warm-reboot case * typo	2020-08-09 11:25:31 -07:00
yozhao101	517592afb8	[dockers] Default container autorestart feature to "enabled" for all except database (#4853 ) Set the default auto_restart state to "enabled" in init_cfg.json for all containers except database Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-08-09 10:50:39 -07:00
isabelmsft	c56ddf0dba	[Kubernetes Setup] Remove flannel, kube-proxy images (#5098 ) Removes installation of kube-proxy (117 MB) and flannel (53 MB) images from Kubernetes-enabled devices. These images are tested to be unnecessary for our use case, as we do not rely on ClusterIPs for Kubernetes Services or a CNI for pod networking.	2020-08-09 10:48:59 -07:00
rkdevi27	5ddfc13a75	[baseimage]: /host unmount timeout issue during reboot. (#5032 ) Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.	2020-08-09 10:38:33 -07:00
rkdevi27	652aa3b072	[baseimage]: /host unmount failed in VM during reboot (#4865 ) Added a check further to make the services to stop appropriately before unmount. Fix #4651	2020-08-09 10:37:12 -07:00
rkdevi27	f1bbda19f0	Fix "/host unmount failure" during reboot (#4558 )	2020-08-09 10:34:02 -07:00
Joe LeVeque	6556c40040	[201911] Introduce sonic-py-common package (#5063 ) Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code. The package currently includes four modules: - daemon_base - device_info - logger - task_base NOTE: This is a combination of all changes from https://github.com/Azure/sonic-buildimage/pull/5003, https://github.com/Azure/sonic-buildimage/pull/5049 and some changes from https://github.com/Azure/sonic-buildimage/pull/5043 backported to align with the 201911 branch. As part of the 201911 port, I am not installing the Python 3 package in the base image or in the VS container, because we do not have pip3 installed, and we do not intend to migrate to Python 3 in 201911.	2020-08-03 11:50:06 -07:00
Nazarii Hnydyn	4e558bca25	[201911][Mellanox] Update MFT to 4.15.0-104 (#5077 ) * [Mellanox] Update MFT to 4.15.0-104. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [Mellanox] Remove build system W/A. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [Mellanox] Add MFT DKMS build support. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2020-08-03 13:53:33 +03:00
abdosi	e3eddede1e	Changes to add template support for copp.json. (#5053 ) * Changes to add template support for copp.json. This is needed so that we can install differnt type of Traps based on Device Role (Tor/Leaf/Mgmt/etc...). Initial use case is to install DHCP/DHCPv6 tarp only for tor router. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fixed based on review comments. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fixed based on review comment.	2020-07-31 17:24:45 -07:00
Joe LeVeque	c96c3cd311	[caclmgrd] Always restart service upon process termination (#5065 )	2020-07-31 17:23:48 -07:00
madhanmellanox	130aeb4cc1	[caclmgrd] Log error message if IPv4 ACL table contains IPv6 rule and vice-versa (#4498 ) * Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table * Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled * Addressed code review comment to replace duplicate code with already existing functions * removed raising Exception when rule conflict in Control plane ACLs are found * added code to remove the rule_props if it is conflicting ACL table versioning rule * addressed review comment to add ignoring rule in the error statement Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>	2020-07-26 11:16:30 -07:00
Stepan Blyshchak	b100ec559e	[services] remove swss from WantedBy for nat service (#4991 ) Otherwise, it may cause issues for warm restarts, warm reboot. Warm restart of swss will start nat which is not expected for warm restart. Also it is observed that during warm-reboot script execution nat container gets started after it was killed. This causes removal of nat dump generated by nat previously: A check [ -f /host/warmboot/nat/nat_entries.dump ] \|\| echo "NAT dump does not exists" was added right before kexec: ``` Fri Jul 17 10:47:16 UTC 2020 Prepare MLNX ASIC to fastfast-reboot: install new FW if required Fri Jul 17 10:47:18 UTC 2020 Pausing orchagent ... Fri Jul 17 10:47:18 UTC 2020 Stopping nat ... Fri Jul 17 10:47:18 UTC 2020 Stopped nat ... Fri Jul 17 10:47:18 UTC 2020 Stopping radv ... Fri Jul 17 10:47:19 UTC 2020 Stopping bgp ... Fri Jul 17 10:47:19 UTC 2020 Stopped bgp ... Fri Jul 17 10:47:21 UTC 2020 Initialize pre-shutdown ... Fri Jul 17 10:47:21 UTC 2020 Requesting pre-shutdown ... Fri Jul 17 10:47:22 UTC 2020 Waiting for pre-shutdown ... Fri Jul 17 10:47:24 UTC 2020 Pre-shutdown succeeded ... Fri Jul 17 10:47:24 UTC 2020 Backing up database ... Fri Jul 17 10:47:25 UTC 2020 Stopping teamd ... Fri Jul 17 10:47:25 UTC 2020 Stopped teamd ... Fri Jul 17 10:47:25 UTC 2020 Stopping syncd ... Fri Jul 17 10:47:35 UTC 2020 Stopped syncd ... Fri Jul 17 10:47:35 UTC 2020 Stopping all remaining containers ... Warning: Stopping telemetry.service, but it can still be activated by: telemetry.timer Fri Jul 17 10:47:37 UTC 2020 Stopped all remaining containers ... NAT dump does not exists Fri Jul 17 10:47:39 UTC 2020 Rebooting with /sbin/kexec -e to SONiC-OS-201911.140-08245093 ... ``` With this change, executed warm-reboot 10 times without hitting this issue, while without this change the issue is easily reproducible almost every warm-reboot run. Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2020-07-26 11:11:35 -07:00
Kebo Liu	0701da4145	[Mellanox] remove code which instructs hw-mgmt to skip mlsw_minimal probing in fast-boot flow (#5011 )	2020-07-26 11:09:26 -07:00
Joe LeVeque	4a2db8e216	[caclmgrd] remove default DROP rule on FORWARD chain (#5034 )	2020-07-26 11:07:42 -07:00
Joe LeVeque	3f3fcd3253	[caclmgrd] Filter DHCP packets based on dest port only (#4995 )	2020-07-21 10:13:17 +00:00
Joe LeVeque	52e45e823e	[201911][sudoers] Add `sonic_installer list` to read-only commands (#4997 ) `sonic_installer list` is a read-only command. Specify it as such in the sudoers file. This will also ensure the new `show boot` command, which calls `sudo sonic_installer list` under the hood doesn't fail due to permissions.	2020-07-17 20:13:42 -07:00
Joe LeVeque	0559b7d3b6	[caclmgrd] Improve code reuse (#4931 ) Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.	2020-07-11 09:48:10 -07:00
arlakshm	7c699df654	Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (#4926 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices Changes done - move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2 - update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms	2020-07-11 09:47:24 -07:00
abdosi	4869fa7173	[sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (#4838 ) * [sonic-buildimage] Changes to make network specific sysctl common for both host and docker namespace (in multi-npu). This change is triggered with issue found in multi-npu platforms where in docker namespace net.ipv6.conf.all.forwarding was 0 (should be 1) because of which RS/RA message were triggered and link-local router were learnt. Beside this there were some other sysctl.net.ipv6* params whose value in docker namespace is not same as host namespace. So to make we are always in sync in host and docker namespace created common file that list all sysctl.net.* params and used both by host and docker namespace. Any change will get applied to both namespace. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments and made sure to invoke augtool only one and do string concatenation of all set commands * Address Review Comments.	2020-07-05 15:32:30 -07:00
SuvarnaMeenakshi	fad2d47421	[systemd-generator]: Fix dependency update for multi-asic platform (#4820 ) * [systemd-generator]: Fix the code to make sure that dependencies of host services are generated correctly for multi-asic platforms. Add code to make sure that systemd timer files are also modified to add the correct service dependency for multi-asic platforms. Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com> * [systemd-generator]: Minor fix, remove debug code and remove unused variable.	2020-07-05 15:26:07 -07:00
abdosi	75d5e30f07	Changes to make default route programming correct in multi-npu platforms (#4774 ) * Changes to make default route programming correct in multi-asic platform where frr is not running in host namespace. Change is to set correct administrative distance. Also make NAMESPACE* enviroment variable available for all dockers so that it can be used when needed. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fix review comments * Review comment to check to add default route only if default route exist and delete is successful.	2020-07-05 15:21:09 -07:00
arlakshm	b6b1f3fac8	syslog changes Multi ASIC platforms (#4738 ) Add changes for syslog support for containers running in namespaces on multi ASIC platforms. On Multi ASIC platforms Rsyslog service is only running on the host. There is no rsyslog service running in each namespace. On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address. The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-07-05 15:19:22 -07:00
roman_savchuk	c357a56c70	[201911] Add executable permission back to supervisor-proc-exit-listener file (#4891 ) While testing reboot case for 201911 facing error: supervisor-proc-exit-listener FATAL command at '/usr/bin/supervisor-proc-exit-listener' is not executable Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>	2020-07-03 14:32:35 -07:00
Joe LeVeque	5df5015835	[build][systemd] Mask disabled services by default (#4721 ) When building the SONiC image, used systemd to mask all services which are set to "disabled" in init_cfg.json. This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph will fail when trying to restart disabled services.	2020-06-28 07:28:56 -07:00
Joe LeVeque	0768bf7733	[hostcfgd] Synchronize all feature statuses once upon start (#4714 ) - Ensure all features (services) are in the configured state when hostcfgd starts - Better functionalization of code - Also replace calls to deprecated `has_key()` method in `tacacs_server_handler()` and `tacacs_global_handler()` with `in` keyword. This PR depends on https://github.com/Azure/sonic-utilities/pull/944, otherwise `config load_minigraph` will fail when trying to restart disabled services.	2020-06-28 07:28:33 -07:00
Kebo Liu	1bade2c67b	Add with_i2cdev for mst start to have I2C device loaded properly (#4790 )	2020-06-28 07:24:35 -07:00
yozhao101	c2364cf03e	[201911][dockers] Update critical_processes file syntax (#4854 ) Backport of https://github.com/Azure/sonic-buildimage/pull/4831 to the 201911 branch	2020-06-26 11:37:05 -07:00
padmanarayana	7564b060e4	[DELL]: FTOS to SONiC fast conversion fixes (#4807 ) While migrating to SONiC 20181130, identified a couple of issues: 1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.	2020-06-20 08:15:05 -07:00
Joe LeVeque	d8886ba473	[caclmgrd] Don't limit connection tracking to TCP (#4796 ) Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.	2020-06-20 08:13:11 -07:00
abdosi	c2981b8cdf	[build] Ensure /usr/lib/systemd/system/ directory exists before referencing (#4788 ) * Fix the Build on 201911 (Stretch) where the directory /usr/lib/systemd/system/ does not exist so creating manually. Change should not harm Master (buster) where the directory is created by Linux * Fix as per review comments	2020-06-17 09:59:53 -07:00
yozhao101	4846fc0337	[docker-syncd] Add timeout to force stop syncd container (#4617 ) - Why I did it When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be stopped as expected. However, I found sometimes syncd container can be stopped, sometimes it can not be stopped. The reason why syncd container can not be stopped is the process (/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process. 164 # wait until syncd quit gracefully 165 while docker top syncd$DEV \| grep -q /usr/bin/syncd; do 166 sleep 0.1 167 done The first thing I did is to profile how long this while loop will spin if syncd container can be normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd container can be normally stopped, two messages will be written into syslog: str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134 str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134 The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds: After that, the testing result is that syncd container can be normally stopped if swss is stopped first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future. - How I did it I added a timer around the while loop in stop() function. This while loop will exit after spinning 20 seconds. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-16 08:21:15 -07:00
Renuka Manavalan	f8a9a1b805	[k8s]: switching to Flannel from Calico. (#4768 ) Switching to Flannel from Calico which brings down the image size by around 500+MB.	2020-06-16 08:18:54 -07:00
Joe LeVeque	c625e0e3e6	[build] Enable telemetry service by default (#4760 ) - Why I did it To ensure telemetry service is enabled by default after installing a fresh SONiC image - How I did it Set telemetry feature status to "enabled" when generating init_cfg.json file	2020-06-16 08:17:47 -07:00
Ying Xie	aecebac86b	[ntp] disable ntp long jump (#4748 ) Found another syncd timing issue related to clock going backwards. To be safe disable the ntp long jump. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-06-16 08:15:00 -07:00
Joe LeVeque	ed0e6aed1c	[hostcfgd] Get service enable/disable feature working (#4676 ) Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here: 1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop 2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots 3. Substitute returns with continues so that even if one service fails, we still try to handle the others Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.	2020-06-16 08:13:32 -07:00
Joe LeVeque	42bc14f44c	[systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673 ) This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation	2020-06-16 08:12:47 -07:00
Olivier Singla	18bbbb3c02	[baseimage]: Run fsck filesystem check support prior mounting filesystem (#4431 ) * Run fsck filesystem check support prior mounting filesystem If the filesystem become non clean ("dirty"), SONiC does not run fsck to repair and mark it as clean again. This patch adds the functionality to run fsck on each boot, prior to the filesystem being mounted. This allows the filesystem to be repaired if needed. Note that if the filesystem is maked as clean, fsck does nothing and simply return so this is perfectly fine to call fsck every time prior to mount the filesystem. How to verify this patch (using bash): Using an image without this patch: Make the filesystem "dirty" (not clean) [we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform] [do this only on a test platform!] dd if=/dev/sda3 of=superblock bs=1 count=2048 printf "$(printf '\\x%02X' 2)" \| dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null dd of=/dev/sda3 if=superblock bs=1 count=2048 Verify that filesystem is not clean tune2fs -l /dev/sda3 \| grep "Filesystem state:" reboot and verify that the filesystem is still not clean Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean. fsck log is stored on syslog, using the string FSCK as markup.	2020-06-16 08:12:11 -07:00
Joe LeVeque	913d380f6b	[caclmgrd] Get first VLAN host IP address via next() (#4685 ) I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web. This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.	2020-06-03 15:38:11 -07:00
Joe LeVeque	f2c0ed8e21	[caclmgrd] Allow more ICMP types (#4625 )	2020-06-03 15:35:49 -07:00
Joe LeVeque	1e59be8941	[caclmgrd] Ignore keys in interface-related tables if no IP prefix is present (#4581 ) Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.	2020-06-03 15:35:10 -07:00
Joe LeVeque	ac957a0c7a	[caclmgrd] Add some default ACCEPT rules and lastly drop all incoming packets (#4412 ) Modified caclmgrd behavior to enhance control plane security as follows: Upon starting or receiving notification of ACL table/rule changes in Config DB: 1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions 2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute 3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute 4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages 5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets 6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets 7. Add iptables/ip6tables commands to allow all incoming BGP traffic 8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP) 9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured) 10. Add iptables rules to drop all packets destined for loopback interface IP addresses 11. Add iptables rules to drop all packets destined for management interface IP addresses 12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses 13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses 14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute) 15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets	2020-06-03 09:41:52 -07:00
Ying Xie	14b3f0022b	[ntp] enable/disable NTP long jump according to reboot type (#4577 ) * [ntp] enable/disable NTP long jump according to reboot type - Enable NTP long jump after cold reboot. - Disable NTP long jump after warrm/fast reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * fix typo * further refactoring * use sonic-db-cli instead	2020-05-20 22:44:14 -07:00
abdosi	bb60e2b670	Changes to support config-setup service for multi-npu (#4609 ) * Changes to support config-setup service for multi-npu platforms. For Multi-npu we are not supporting as of now config initializtion and ZTP. It will support creating config db from minigraph or using config db from previous file system Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments. * Address Review comments * Address Review Comments of using pyhton based config load_minigraph/ config save/config reload from shell scripts so that we don't duplicate code. Also while running from shell we will skip stop/start services done by those commands. * Updated to use python command so no code duplication.	2020-05-20 22:44:14 -07:00
abdosi	508f6bfa02	Fix for issue where image is compile with flag ENABLE_DHCP_GRAPH_SERVICE (#4573 ) and then we load image and reboot even if there was existing config_db.json we will look for DHCP Service. we should disbale update_graph in such cases. This behaviour is silimar to what we have in 201811 image.	2020-05-20 07:53:23 -07:00
abdosi	9ea746e25f	Changes for LLDP docker to support multi-npu platforms (#4530 ) * Changes for LLDP for Multi NPU Platoforms:- a) Enable LLDP for Host namespace for Management Port b) Make sure Management IP is avaliable in per asic namespace needed for LLDP Chassis configuration c) Make sure chassis mac-address is correct in per asic namespace d) Do not run lldp on eth0 of per asic namespace and avoid chassis configuration for same e) Use Linux hostname instead from Device Metadata for lldp chassis configuration since in multi-npu platforms device metadata hostname will be differnt Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comment with following changes: a) Use Device Metadata hostname even in per namespace conatiner. updated minigraph parsing for same to have hostname as system hostname and add new key for asic name b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace config also as needed for LLDP for setting chassis management IP. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments	2020-05-20 07:51:49 -07:00
lguohan	710d176162	[baseimage]: pin down package version for azure-storage, watchdog and futures (#4575 ) Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-05-12 06:19:05 +00:00
judyjoseph	c808640f4e	Multi DB with namespace support, Introducing the database_global.json… (#4477 ) * Multi DB with namespace support, Introducing the database_global.json file for supporting accessing DB's in other namespaces for service running in linux host * Updates based on comments * Adding the j2 templates for database_config and database_global files. * Updating to retrieve the redis DIR's to be mounted from database_global.json file. * Additional check to see if asic.conf file exists before sourcing it. * Updates based on PR comments discussion. * Review comments update * Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access. * Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier. * Removing the database_config.json file to avioid confusion in future. We use the database_config.json.j2 file to generate database_config.json files dynamically. * Update the comments for sudo usage in docker_image_ctrl.j2 * Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the PONG response is received when redis server is up. * Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli * Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.	2020-05-09 21:33:07 -07:00
Santhosh Kumar T	1e3df476e5	[DellEMC] S6100 Last Reboot Reason Thermal Support (#3767 )	2020-05-09 18:37:31 -07:00
wangshengjun	18e51088a0	[ebtables]add the filter rule for ARP packets with vlan tag: (#3945 ) 1. ebtables -t filter -A FORWARD -p 802_1Q --vlan-encap 0806 -j DROP The ARP packet with vlan tag can't match the default rule. Signed-off-by: wangshengjun <wangshengjun@asterfusion.com>	2020-05-09 18:36:36 -07:00
Joe LeVeque	9bdd2ef014	[process-reboot-cause] If software reboot cause is unknown add note if first boot into new image (#4538 )	2020-05-09 18:17:31 -07:00
Dong Zhang	3faa4e936e	[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541 )	2020-05-09 18:16:48 -07:00
Akhilesh Samineni	3be7c5786b	[NAT] : Removed requires dependency on swss (#4551 ) Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>	2020-05-09 18:16:02 -07:00
Neetha John	596bec1b32	[qos]: Alpha and ECN settings change for Th (#4564 ) Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices Changed the dynamic threshold settings in pg_profile_lookup.ini Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices Necessary changes made in qos.config.j2 to use the macro if present Signed-off-by: Neetha John <nejo@microsoft.com>	2020-05-09 18:13:10 -07:00
arlakshm	542f722055	[docker]: Enabled ipv6 in dockers when using docker bridge network (#4426 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-04-27 08:50:23 -07:00
pavel-shirshov	2f44bcd071	[bgpcfgd]: Split one bgp mega-template to chunks. (#4143 ) The one big bgp configuration template was splitted into chunks. Currently we have three types of bgp neighbor peers: general bgp peers. They are represented by CONFIG_DB::BGP_NEIGHBOR table entries dynamic bgp peers. They are represented by CONFIG_DB::BGP_PEER_RANGE table entries monitors bgp peers. They are represented by CONFIG_DB::BGP_MONITORS table entries This PR introduces three templates for each peer type: bgp policies: represent policieas that will be applied to the bgp peer-group (ip prefix-lists, route-maps, etc) bgp peer-group: represent bgp peer group which has common configuration for the bgp peer type and uses bgp routing policy from the previous item bgp peer-group instance: represent bgp configuration, which will be used to instatiate a bgp peer-group for the bgp peer-type. Usually this one is simple, consist of the referral to the bgp peer-group, bgp peer description and bgp peer ip address. This PR redefined constant.yml file. Now this file has a setting for to use or don't use bgp_neighbor metadata. This file has more parameters for now, which are not used. They will be used in the next iteration of bgpcfgd. Currently all tests have been disabled. I'm going to create next PR with the tests right after this PR is merged. I'm going to introduce better bgpcfgd in a short time. It will include support of dynamic changes for the templates. FIX:: #4231	2020-04-25 09:41:28 +00:00
Renuka Manavalan	9b017a83b5	[baseimage]: Install Kubernetes packages if enabled in image (#4374 ) (#4432 ) Install kubeadm, which transparently installs kubelet & kubectl As well download required Kubernetes images required to run as kubernetes node. The kubelet service is intentionally kept in disabled state, as it would otherwise continuously restart wasting resources, until join to master.	2020-04-16 21:54:45 -07:00
SuvarnaMeenakshi	2f66b4c545	[sonic-netns-exec]: use "$@" to reflects all positional parameters as they were set initially (#4375 ) sonic-netns-exec fails to execute below command in swss.sh: sonic-netns-exec "$NET_NS" sonic-db-cli $1 EVAL " local tables = {$2} for i = 1, table.getn(tables) do local matches = redis.call('KEYS', tables[i]) for j,name in ipairs(matches) do redis.call('DEL', name) end end" 0 This command fails with error " redis.exceptions.ResponseError: value is not an integer or out of range" . Root cause: When sonic-netns-exec executes the above function, argument passed to sonic-db-cli is NOT executed as a single script. The argument is passed as separate keywords to sonic-db-cli, as below: ['EVAL', 'local', 'tables', '=', "{'PORT_TABLE'}", 'for', 'i', '=', '1,', 'table.getn(tables)', 'do', 'local', 'matches', '=', "redis.call('KEYS',", 'tables[i])', 'for', 'j,name', 'in', 'ipairs(matches)', 'do', "redis.call('DEL',", 'name)', 'end', 'end', '0'] - How I did it To make sure that the parameters are passed as they were set initially, fix sonic-netns-exec to use double quoted "$@", where "$@" is "$1" "$2" "$3" ... "${N}" After fix, the argument passed to sonic-db-cli is as below: Argument passed to sonic-db-cli: ['EVAL', "\n local tables = {'PORT_TABLE'}\n for i = 1, table.getn(tables) do\n local matches = redis.call('KEYS', tables[i])\n for j,name in ipairs(matches) do\n redis.call('DEL', name)\n end\n end", '0'] Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>	2020-04-15 13:13:31 -07:00
SuvarnaMeenakshi	0099305475	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-04-15 13:08:34 -07:00
Nazarii Hnydyn	0b35fcf3bf	[mellanox]: Add SSD FW update tool (#4351 ) * [mellanox]: Add SSD FW update tool. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [mellanox]: Align Platform API. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [mellanox]: Fix firmware description. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [mellanox]: Update SSD tool. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2020-04-15 13:02:36 -07:00
rajendra-dendukuri	a97b73e79c	Fix typo in config-setup service (#4388 )	2020-04-10 21:23:07 -07:00
Abhishek Dosi	249265ad99	Revert "Multi-ASIC implementation (#3888 )" This reverts commit `2e87a16941`.	2020-04-03 14:34:38 -07:00
Samuel Angebault	8819322210	[Arista] Update drivers submodules (#4353 ) * Update arista drivers submodules * Add device configs for 7060CX2-32S * Update boot0 and union-mount for 7060CX2-32S * Add 7170-32C and 7170-32CD support in boot0 * Sync after writting boot configs * Add 7170-32C and 7170-32CD device configurations Co-authored-by: Boyang Yu <byu@arista.com> Co-authored-by: Boyang Yu <byu@arista.com>	2020-04-01 23:26:42 -07:00
SuvarnaMeenakshi	2e87a16941	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-04-01 23:21:49 -07:00
Kebo Liu	2fd1641feb	copy spc3 fw file to image (#4328 )	2020-03-29 22:48:10 -07:00
Garrick He	a059d7ec0e	[procdockerstatsd] Fix CMD field in dB (#4335 ) * Fix the CMD for the PROCESSSTATS entries so that there is a space between the command name and the arguments. Signed-off-by: Garrick He <garrick_he@dell.com>	2020-03-29 22:47:05 -07:00
Stepan Blyshchak	ee84dca683	[docker_image_ctl.j2] Share UTS namespace with host OS (#4169 ) Instead of updating hostname manualy on Config DB hostname change, simply share containers UTS namespace with host OS. Ideally, instead of setting `--uts=host` for every container in SONiC, this setting can be set per container if feature requires. One behaviour change is introduced in this commit, when `--privileged` or `--cap-add=CAP_SYS_ADMIN` and `--uts=host` are combined, container has privilege to change host OS and every other container hostname. Such privilege should be fixed by limiting containers capabilities. Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2020-03-22 23:04:02 -07:00
SuvarnaMeenakshi	7b4b1245bd	[ntp]: Add "tinker panic 0" in ntp.conf to avoid ntpd from panic (#4263 ) - What I did Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large. - How I did it Added "tinker panic 0" in ntp.conf file. - How to verify it [this assumes that there is a valid NTP server IP in config_db/ntp.conf] Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s. Reboot the device. Before the fix: 3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time. After the fix: 3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.	2020-03-22 23:00:40 -07:00
yozhao101	358570324b	[Monit] Delay start of monitoring for 5 minutes (#4281 )	2020-03-22 22:58:57 -07:00
Andriy Kokhan	39889a3c35	[Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247 . (#4250 ) * [Service] Added NAT entry into CONTAINER_FEATURE. Fixes #4247. Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>	2020-03-19 22:18:13 -07:00
Joe LeVeque	8e36068237	[sonic-cfggen] Loading the configuration from init_cfg.json and then from config_db.json (#4148 )	2020-03-15 08:54:05 -07:00
Olivier Singla	a8baca0d6e	[kernel]: security kernel update to 4.9.189 (#3913 ) This patch upgrade the kernel from version 4.9.0-9-2 (4.9.168-1+deb9u3) to 4.9.0-11-2 (4.9.189-3+deb9u2) Co-authored-by: rajendra-dendukuri <47423477+rajendra-dendukuri@users.noreply.github.com>	2020-03-15 08:52:29 -07:00
Joe LeVeque	102cb83097	[Services] Restart NAT service upon unexpected critical process exit. (#4208 )	2020-03-14 18:03:29 -07:00
Stephen Sun	c700127101	[Mellanox]Take advantage of sdk variable to customize the location where sdk_socket exists. (#4223 ) Take advantage of an SDK environment variable to customize the location where sdk_socket exists. In the latest SDK sdk_socket has been moved from /tmp to /var/run which is a better place to contain this kind of file. However, this prevents the subdirs under /var/run from being mapped to different volumes. To resolve this, we take advantage of an SDK variable to designate the location of sdk_socket. This requires every process that requires to access sdk_socket have this environment variable defined. However, to define environment variable for each process is less scalable. We take advantage of the docker scope environment variable to avoid that. It depends on PR 4227	2020-03-14 18:02:43 -07:00
byu343	950926a837	[arista]: Add support for Arista Lodoga (#4232 ) Backport the support of Arista Lodoga to 201911	2020-03-11 13:12:39 -07:00
Abhishek Dosi	cc2d497aa4	Fixing Bad Cherry-pick	2020-03-04 10:46:45 -08:00
rajendra-dendukuri	8581a52571	ZTP infrastructure changes to support DHCP discovery provisioning data (#3298 ) * ZTP infrastructure changes to support DHCP discovery provisioning data - Dynamically generate DHCP client configuration based on current ZTP state - Added support to request and process hostname when using DHCPv6 - Do not process graphservice url dhcp option if ZTP is enabled, ZTP service will process it - Generate /e/n/i file with all active interfaces seeking address assignment via DHCP. Only interfaces that are created in Linux will be added to /e/n/i. Also DHCP is started only on linked up in-band interfaces. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-03-03 22:23:59 -08:00
yozhao101	5c8c4b2a50	[Services] Restart BGP service upon unexpected critical process exit. (#4207 )	2020-03-03 19:19:44 -08:00
rajendra-dendukuri	1edb69647e	[sonic-ztp]: Build sonic-ztp package (#3299 ) * Build sonic-ztp package - Add changes in make rules to conditionally include sonic-ztp package Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-02-24 14:27:24 -08:00
Stepan Blyshchak	398929c622	[mgmt-framework] start after syncd (#4174 ) every service starts after syncd to start the most critical parts first Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2020-02-24 11:04:51 -08:00
Prince Sunny	20510d58d3	Sleep done before mismatch handler (#4165 ) * Sleep done before mismatch handler	2020-02-24 10:25:56 -08:00
Prince Sunny	6740b2d3df	Fix service and container name to be same (#4151 )	2020-02-24 10:24:11 -08:00
Joe LeVeque	f6d69aed49	[interfaces-config.sh] Do not bring 'lo' interface down and up (#4150 )	2020-02-24 10:23:35 -08:00
Sumukha Tumkur Vani	af4e84298a	Start RestAPI container when sonic boots (#4140 ) * Start RestAPI container when sonic boots	2020-02-24 10:16:02 -08:00
Stephen Sun	48f8a8d40e	[Mellanox] platform api support firmware install (#3931 ) support firmware install, including CPLD and BIOS. CPLD: cpldupdate BIOS: boot to onie and update BIOS in onie and then boot to SONiC	2020-02-24 10:14:52 -08:00
byu343	f197f0d2a9	[arista]: Fix convertfs condition for booting from EOS (#4139 ) Fix the issue of incorrectly skipping the convertfs hook when fast-reboot from EOS, by adding an extra kernel cmdline param "prev_os" to differentiate fast-reboot from EOS and from SONiC. This is because we still do disk conversion for fast reboot from eos to sonic, like format the disk.	2020-02-13 16:20:53 -08:00
yozhao101	3ac345922b	[Services] Restart database service upon unexpected critical process exit. (#4138 ) * [database] Implement the auto-restart feature for database container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] Remove the duplicate dependency in service files. Since we already have updategraph ---> config_setup ---> database, we do not need explicitly add database.service in all other container service files. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Reorganize the line 73 in event listener script. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] update the file sflow.service.j2 to remove the duplicate dependency. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add comments in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Update the comments in line 56. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add parentheses for if statement in line 76 in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-02-13 16:20:38 -08:00
yozhao101	71225ea4cc	[Service] Enable/disable container auto-restart based on configuration. (#4073 )	2020-02-13 16:20:21 -08:00
yozhao101	984c43e01d	[init_cfg.json] Add new FEATURE and CONTAINER_FEATURE tables (#4137 ) * [init_cfg.json] Add a new table CONTAINER_FEATURE. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [init_cfg.json] Update the content of table CONTAINER_FEATURE. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [init_cfg.json] Use the template to generate the table CONTAINER_FEATURE. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [init_cfg.json] Add a new table FEATURE. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [init_cfg.json] Change the order of container names according to alphabetical order. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [init_cfg.json] Change the dhcp_relay container name and add rest-api. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-02-13 16:07:41 -08:00
yozhao101	f061353655	[init_cfg.json] Maintain a separate init_cfg.json.j2 template file (#4092 )	2020-02-13 16:07:23 -08:00
pra-moh	c70a7b877d	[procdockerstatsd] Fix incorrect case issue in service file (#4134 )	2020-02-13 16:06:30 -08:00
Stephen Sun	6143fdd54d	[process-reboot-cause]Clean up the process-reboot-cause as reqired in issue 3927 (#4128 )	2020-02-13 16:05:55 -08:00
pra-moh	e1946432ff	[procdockerstats]: Update file permission for procdockerstatsd (#4126 )	2020-02-13 16:05:36 -08:00
Prince Sunny	e87f27050b	Update arp_update to refresh neighbor entries from APP_DB (#4125 )	2020-02-13 16:05:19 -08:00
kannankvs	74ac9b02dc	modified down rules to pre-down rules to ensure that default route is… (#3853 ) * modified down rules to pre-down rules to ensure that default route is deleted just before interface is made down	2020-02-13 16:01:21 -08:00
kannankvs	a836ead688	mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from con… (#4057 ) * mvrf_avoid_snmp_yml_config: made changes to pass SNMP config from confiDB to snmpd.conf without using snmp.yml * added a missing if condition	2020-02-03 15:38:38 -08:00
pra-moh	8e4a4caf79	[baseimage]: removing space from shebang in procdockerstatsd (#4051 )	2020-02-03 15:37:47 -08:00
Dong Zhang	42bffc1215	[MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector (#4035 ) * [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * update comment for a potential bug * update comment * add TODO maker as review reqirement	2020-02-03 15:36:55 -08:00
Howard Persh	cc825ff2fe	[startup] Fixes issue with /var/platform directory not created (#4000 )	2020-02-03 15:34:34 -08:00
SuvarnaMeenakshi	abe7ef7e2e	[baseimage]: support building multi-asic component (#3856 ) - move single instance services into their own folder - generate Systemd templates for any multi-instance service files in slave.mk - detect single or multi-instance platform in systemd-sonic-generator based on asic.conf platform specific file. - update container hostname after creation instead of during creation (docker_image_ctl) - run Docker containers in a network namespace if specified - add a service to create a simulated multi-ASIC topology on the virtual switch platform Signed-off-by: Lawrence Lee <t-lale@microsoft.com> Signed-off-by: Suvarna Meenakshi <Suvarna.Meenaksh@microsoft.com>	2020-02-03 15:32:21 -08:00
Kiran Kumar Kella	a943e6ce45	Changes in sonic-buildimage to support the NAT feature (#3494 ) * Changes in sonic-buildimage for the NAT feature - Docker for NAT - installing the required tools iptables and conntrack for nat Signed-off-by: kiran.kella@broadcom.com * Add redis-tools dependencies in the docker nat compilation * Addressed review comments * add natsyncd to warm-boot finalizer list * addressed review comments * using swsscommon.DBConnector instead of swsssdk.SonicV2Connector * Enable NAT application in docker-sonic-vs	2020-02-03 15:30:39 -08:00
B S Rama krishna	5a4f19e04a	[kdump]: porting kdump installation skip on arm to 201911 (#4081 )	2020-01-29 09:07:12 -08:00
Joe LeVeque	ccdc097a8f	[caclmgrd] Fix application of IPv6 service ACL rules (part 2) (#4036 )	2020-01-21 10:53:16 -08:00
Sujin Kang	9deb8c15f3	[reboot cause]: Delay process-reboot-cause service until network connection is stable (#4003 )	2020-01-21 10:47:13 -08:00
yozhao101	82c2eee1e6	[Monit] Change the monitoring period from 120 seconds to 60 seconds. (#3974 ) * [Monit] Change the monitoring period of monit from 120 seconds to 60 seconds and also at the same time double the interval for existing sonic monit config file in host. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-01-21 10:44:36 -08:00
Joe LeVeque	aad6b9c034	[apt] Instruct apt-get to NOT check the "Valid Until" date in Release files (#3973 ) This is an addendum to #3958, which also instructs apt to ignore the "Valid Until" date in Release files inside the slave containers, making a complete solution, much like the previously abandoned PR #2609. This patch also unifies file names and contents. When the Debian team archives a repo, it stops updating the "Valid Until" date, thus apt-get will not apply updates for that repo unless we explicitly tell it to ignore the "Valid Until" date. Also, this has become an issue with active (i.e., non-archived) repos twice in the past year because the Debian folks seem to occasionally let the expiration lapse before updating the date. This will cause SONiC builds to fail with a message like E: Release file for http://debian-archive.trafficmanager.net/debian-security/dists/jessie/updates/InRelease is expired (invalid since 3d 3h 11min 20s). Updates for this repository will not be applied. until the dates have been updated and propagated to all mirrors. With this patch, SONiC should no longer be affected by lapsed "Valid Until" dates, whether they be accidental or purposeful.	2020-01-21 10:43:51 -08:00
rajendra-dendukuri	bb34edf1af	[config-setup]: create a SONiC configuration management service (#3227 ) * Create a SONiC configuration management service * Perform config db migration after loading config_db.json to redis DB * Migrate config-setup post migration hooks on image upgrade config-setup post migration hooks help user to migrate configurations from old image to new image. If the installed hooks are user defined they will not be part of the newly installed image. So these hooks have to be migrated to new image and only then they can be executing when the new image is booting. The changes in this fix migrate config-setup post-migration hooks and ensure that any hooks with the same filename in newly installed image are not overwritten. It is expected that users install new hooks as per their requirement and not edit existing hooks. Any changes to existing hooks need to be done as part of new image and not post bootup.	2020-01-21 10:39:19 -08:00
Prabhu Sreenivasan	7ec2732387	SONiC Management Framework Release 1.0 (#3488 ) * Added sonic-mgmt-framework as submodule / docker * fix build issues * update sonic-mgmt-framework submodule branch to master * Merged changes 70007e6d2ba3a4c0b371cd693ccc63e0a8906e77..00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 00d4fcfed6a759e40d7b92120ea0ee1f08300fc6 Modified environemnt variables * Changes to build sonic-mgmt-framework docker * bumped up sonic-mgmt-framework commit-id * version bump for sonic-mgmt-framework commit-it * bumped up sonic-mgmt-framework commit-id * Add python packages to docker * Build fix for docker with python packages * added libyang as dependent package * Allow building images on NFS-mounted clones Prior to this change, `build_debian.sh` would generate a Debian filesystem in `./fsroot`. This needs root permissions, and one of the tests that is performed is whether the user can create a character special file in the filesystem (using mknod). On most NFS deployments, `root` is the least privileged user, and cannot run mknod. Also, attempting to run commands like rm or mv as root would fail due to permission errors, since the root user gets mapped to an unprivileged user like `nobody`. This commit changes the location of the Debian filesystem to `/fsroot`, which is a tmpfs mount within the slave Docker. The default squashfs, docker tarball and zip files are also created within /tmp, before being copied back to /sonic as the regular user. The side effect of this change is that the contents of `/fsroot` are no longer available once the slave container exits, however they are available within the squashfs image. Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com> * bumped up sonc-mgmt-framework commit to include PR #18 * REST Server startup script is enahnced to read the settings from ConfigDB. Below table provides mapping of db field to command line argument name. ============================================================ ConfigDB entry key Field name REST Server argument ============================================================ REST_SERVER\|default port -port REST_SERVER\|default client_auth -client_auth REST_SERVER\|default log_level -v DEVICE_METADATA\|x509 server_crt -cert DEVICE_METADATA\|x509 server_key -key DEVICE_METADATA\|x509 ca_crt -cacert ============================================================ * Replace src/telemetry as submodule to sonic-telemetry * Update telemetry commit HEAD * Update sonic-telemetry commit HEAD * libyang env path update * Add libyang dependency to telemetry * Add scripts to create JSON files for CLI backend Scripts to create /var/platform/syseeprom and /var/platform/system, which are back-end files for CLI, for system EEPROM and system information. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * In startup script, create directory where CLI back-end files live Signed-off-by: Howard Persh <Howard_Persh@dell.com> * build dependency pkgs added to docker for build failure fix * Changes to fix build issue for mgmt framework * Fix exec path issue with telemetry * s5232[device] PSU detecttion and default led state support * Processing of first boot in rc.local should not have premature exit Signed-off-by: Howard Persh <Howard_Persh@dell.com> * docker mount options added for platform, system features * bumped up sonic-mgmt-framework commit id to pick 23rd July 2019 changes * Added mount options for telemetry docker to get access for system and platform info. * Update commit for sonic-utilities * [dell]: Corrected dport map and renamed config files for S5232F * Fix telemetry submodule commit * added support for sonic-cli console * [Dell S5232F, Z9264F] Harden FPGA driver kernel module For Dell S5232F and Z9264F platforms, be more strict when checking state in ISR of FPGA driver, to harden against spurious interrupts. Signed-off-by: Howard Persh <Howard_Persh@dell.com> * update mgmt-framework submodule to 27th Aug commit. * remove changes not related to mgmt-framework and sonic-telemetry * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * Revert "Replace src/telemetry as submodule to sonic-telemetry" This reverts commit `11c3192975`. * make submodule changes and remove a change not related to PR * more changes * Update .gitmodules * Update Dockerfile.j2 * Update .gitmodules * Update .gitmodules * Update .gitmodules reverting experimental change * Removed syspoll for release_1.0 Signed-off-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> * Update docker-sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Update docker-sonic-mgmt-framework.mk * Revert "Processing of first boot in rc.local should not have premature exit" This reverts commit `e99a91ffc2`. * Remove old telemetry directory * Update docker-sonic-mgmt-framework.mk * Resolving merge conflict with Azure * Reverting the wrong merge * Use CVL_SCHEMA_PATH instead of changing directory for telemetry startup * Add missing export * Add python mmh3 to slave dockerfile * Remove sonic-mgmt-framework build dep for telemetry, fix dialout startup issues * Provided flag to disable compiling mgmt-framework * Update sonic-utilites point latest commit id * Point sonic-utilities to Azure accepted SHA * Updating mgmt framework to right sha * Add sonic-telemetry submodule * Update the mgmt-framework commit id Co-authored-by: jghalam <joe.ghalam@gmail.com> Co-authored-by: Partha Dutta <51353699+dutta-partha@users.noreply.github.com> Co-authored-by: srideepDell <srideep_devireddy@dell.com> Co-authored-by: nirenjan <nirenjan@users.noreply.github.com> Co-authored-by: Sachin Holla <51310506+sachinholla@users.noreply.github.com> Co-authored-by: Eric Seifert <seiferteric@gmail.com> Co-authored-by: Howard Persh <hpersh@yahoo.com> Co-authored-by: Jeff Yin <29264773+jeff-yin@users.noreply.github.com> Co-authored-by: Arunsundar Kannan <31632515+arunsundark@users.noreply.github.com> Co-authored-by: rvasanthm <51932293+rvasanthm@users.noreply.github.com> Co-authored-by: Ashok Daparthi-Dell <Ashok_Daparthi@Dell.com> Co-authored-by: anand-kumar-subramanian <51383315+anand-kumar-subramanian@users.noreply.github.com>	2020-01-08 15:51:02 -08:00
Abhishek	6045e34650	Merge branch 'abdosi/master_201911_label_to_201911' into 201911. Cherry pick changes from master into 201911	2020-01-06 17:30:03 -08:00
Joe LeVeque	5e07b252ff	[monit] Build from source and patch to use MemAvailable value if available on system (#3875 )	2020-01-06 11:41:20 -08:00
Stepan Blyshchak	b834c9ff34	[services] make snmp.timer work again and delay telemetry.service (#3742 ) Delay CPU intensive services at boot - How I did it Made snmp.timer work and add telemetry.timer. But this is not enough because it breaks the existing snmp dependency on swss. So, in this solution snmp timer is a wanted by swss service, but since OnBootSec timer expires only once it will not trigger snmp service, so I added line "OnUnitActiveSec=0 sec" which will start snmp service based on the last time it was active. On boot only OnBootSec will expire, on swss start/restarts only second timer will expire immediately and trigger snmp service. However, snmp service will not stop after "systemctl stop snmp" because of the second timer which will always expire when snmp service because unavailable. So there is a conflict which will be handled by systemd if we add "Conflicts=" line to both snmp.service and snmp.timer. So during boot: snmp does not start by default swss starts and starts snmp timer OnUnitActiveSec=0 does not expire since there is no snmp active OnBootSec expires and starts snmp service and snmp timer gets stopped During "systemctl restart swss" snmp stops because of Requisite on swss snmp unblocks snmp timer from running swss starts and starts snmp timer OnUnitActiveSec=0 expires imidiately and start snmp which stops snmp timer During "systemctl stop snmp" stop of snmp service unblocks snmp timer but no one starts the timer so it is not started by "OnUnitActiveSec=0"	2020-01-06 10:32:24 -08:00
pavel-shirshov	74b45be487	[fast-reboot]: Save fast-reboot state into the db (#3741 ) Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.	2020-01-06 10:30:36 -08:00
lguohan	b2234a682d	[docker-base-stretch]: Do not check expire for stretch-backports repo (#3958 ) * [docker-base-stretch]: Do not check expire for stretch-backports repo Signed-off-by: Guohan Lu <gulv@microsoft.com>	2020-01-03 10:44:26 -08:00
Ying Xie	df81943ec5	Revert "[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 )" (#3835 ) This reverts commit `351410ea8c`.	2020-01-02 14:35:55 -08:00
Joe LeVeque	fd3d8c23b2	[services] sflow service sets swss service as Requisite=, not Requires= (#3819 ) The sflow service should not start unless the swss service is started. However, if this service is not started, the sflow service should not attempt to start them, instead it should simply fail to start. Using Requisite=, we will achieve this behavior, whereas using Requires= will cause the required service to be started.	2020-01-02 14:29:11 -08:00
Stepan Blyshchak	3474e8fddd	[syncd.sh] remove chipdown on mellanox (#3926 ) ASIC reset events are captured by hw-mgmt and hw-mgmt calls chipup/chipdown internally without OS iteraction Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-12-31 14:43:32 -08:00
Joe LeVeque	f0b7dfad7c	[caclmgrd] Fix application of IPv6 service ACL rules (#3917 )	2019-12-31 14:42:49 -08:00
Renuka Manavalan	2d079a15dd	corefile uploader: Updates per review comments offline (#3915 ) * Updates per review comments 1) core_uploader service waits for syslog.service 2) core_uploader service enabled for restart on failure 3) Use mtime instead of file size + ample time to be robust. * Avoid reloading already uploaded file, by marking the names with a prefix. * Updated failing path. 1) If rc file is missing or required data missing, it periodically logs error in forever loop. 2) If upload fails, retry every hour with a error log, forever. * Fix few bugs * The binary update_json.py will come from sonic-utilities.	2019-12-31 14:42:01 -08:00
Ying Xie	2c7a01a421	[swss service] flush fast-reboot enabled flag upon swss stopping (#3908 ) If we need to stop swss during fast-reboot procedure on the boot up path, it means that something went wrong, like syncd/orchagent crashed already, we are stopping and restarting swss/syncd to re-initialize. In this case, we should proceed as if it is a cold reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-18 11:20:45 -08:00
Ying Xie	759bde3a43	[hostcfgd] avoid in place editing config file contents (#3904 ) In place editing (sed -i) seems having some issues with filesystem interaction. It could leave 0 size file or corrupted file behind. It would be safer to sed the file contents into a new file and switch new file with the old file. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-18 11:20:25 -08:00
Renuka Manavalan	14f7b8da2d	Corefile uploader service (#3887 ) * Corefile uploader service 1) A service is added to watch /var/core and upload to Azure storage 2) The service is disabled on boot. One may enable explicitly. 3) The .rc file to be updated with acct credentials and http proxy to use. 4) If service is enabled with no credentials, it would sleep, with periodic log messages 5) For any update in .rc, the service has to be restarted to take effect. * Remove rw permission for .rc file for group & others. * Changes per review comments. Re-ordered .rc file per JSON.dump order. Added a script to enable partial update of .rc, which HWProxy would use to add acct key. * Azure storage upload requires python module futures, hence added it to install list. * Removed trailing spaces. * A mistake in name corrected. Copy the .rc updater script to /usr/bin.	2019-12-18 11:19:25 -08:00
Stephen Sun	ba4f0f30c8	[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot (#3880 ) * [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown. * [process-reboot-cause]Fix review comments * [process-reboot-cause]address comments 1. use "with" statement 2. update fast/warm reboot BOOT_ARG * [process-reboot-cause]address comments * refactor the code flow * Remove escape * Remove extra ':'	2019-12-18 11:17:17 -08:00
pra-moh	bfa96bbce3	Add daemon which periodically pushes process and docker stats to State DB (#3525 )	2019-11-27 15:35:41 -08:00
Joe LeVeque	5e6f8adb22	[services] Remove explicit dependencies from dhcp_relay service file, control in swss.sh (#3823 )	2019-11-26 16:59:45 -08:00
pra-moh	d3a1555f30	[hostcfgd] Add support to enable/disable optional features (#3653 )	2019-11-26 14:11:12 -08:00
yozhao101	67fc68513e	[Services] Restart Sflow service upon unexpected critical process exit. (#3751 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-25 13:02:00 -08:00
Joe LeVeque	351410ea8c	[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 ) 'systemctl start'	2019-11-22 20:39:09 -08:00
yozhao101	df11b2b9f1	[Services] Restart Telemetry service upon unexpected critical process exit. (#3768 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-18 16:56:44 -08:00
kannankvs	4007d9ba9c	[ntp]: modified ntp script to hide the error related to cfggen (#3745 ) This PR is to handle the issue 3527. When device boots up, NTP throws a traceback as explained in the issue 3527. - Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”. - Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback. - This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.	2019-11-14 00:06:54 -08:00
Joe LeVeque	c50c390eb4	[rsyslog] Add support for IPv6 remote addresses (#3754 )	2019-11-14 00:00:55 -08:00
Tyler Li	c07ae3b16f	Loopback ip addresses move to intfmgrd for supporting VRF	2019-11-10 02:27:33 -08:00
Joe LeVeque	85b0de3df1	[docker-syncd]: Restart SwSS, syncd and dependent services if a critical process in syncd container exits unexpectedly (#3534 ) Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.	2019-11-09 10:26:39 -08:00
Olivier Singla	c70d8bca9f	[baseimage]: kdump support (#3722 ) * In the event of a kernel crash, we need to gather as much information as possible to understand and identify the root cause of the crash. Currently, the kernel does not provide much information, which make kernel crash investigation difficult and time consuming. Fortunately, there is a way in the kernel to provide more information in the case of a kernel crash. kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. This PR will add kermel kdump support. An extension to the CLI utilities config and show is provided to configure and manage kdump: - enable / disable kdump functionality - configure kdump (how many kernel crash logs can be saved, memory allocated for capture kernel) - view kernel crash logs	2019-11-08 23:08:42 -08:00
Ying Xie	96fffd883d	Revert "[services] make snmp.timer work again and delay telemetry.service (#3657 )" (#3729 ) This reverts commit `d346cb3898`.	2019-11-08 21:44:25 -08:00
lguohan	6d46badbdc	[aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716 )	2019-11-06 20:18:31 -08:00
Neetha John	95466c3ab7	[pfcwd]: Do not start pfc watchdog on Management Tor (#3719 ) Signed-off-by: Neetha John <nejo@microsoft.com>	2019-11-06 18:51:02 -08:00
pavel-shirshov	d5af096f41	[TSA]: Add community to the loopback prefix, when isolated (#3708 ) * Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml * Fix bgp templates * Add community for loopback when bgpd is isolated * Use correct community value	2019-11-06 16:07:28 -08:00
Stepan Blyshchak	d346cb3898	[services] make snmp.timer work again and delay telemetry.service (#3657 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-11-06 12:12:31 -08:00
yozhao101	a117b25446	[Services] Restart LLDP service upon unexpected critical process exit. (#3713 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-06 11:02:57 -08:00
Samuel Angebault	05e659901f	[arista] Add support for more 7280CR3 variants (#3711 ) * Add extra Smartsville hwskus	2019-11-06 10:11:38 -08:00
yozhao101	ed79f54569	[Services] Restart DHCP-Relay service upon unexpected critical process exit. (#3667 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-05 18:32:14 -08:00
yozhao101	4c31ef3cd2	[Services] Restart Teamd service upon unexpected critical process exit. (#3703 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-04 17:45:41 -08:00
yozhao101	4fa3a1e27e	[Services] Restart Platform-monitor service upon unexpected critical process exit. (#3689 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-04 17:44:01 -08:00
Stepan Blyshchak	8dbe13c4cc	[services] improve startup time by changing startup order (#3656 ) * [services] improve startup time by given precedence to critical services (syncd.service) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-10-31 09:18:26 -07:00
yozhao101	cff30c59d0	[Services] Restart Router-advertiser service upon unexpected critical process exit (#3681 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-10-30 16:41:55 -07:00
Ying Xie	5961e031e1	[hostname-config] improve hostname-config process (#3676 ) We noticed in tests/production that there is a low probability failure where /etc/hosts could have some garbage characters before the entry for local host name. The consequence is that all sudo command would be very slow. In extreme cases it would prevent some services from starting properly. I suspect that the /etc/hosts file might be opened by some process causing the issue. Editing contents with new file level and replace the whole file should be safer. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-29 08:30:27 -07:00
Danny Allen	63328814fc	[core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659 ) Signed-off-by: Danny Allen <daall@microsoft.com>	2019-10-23 15:55:47 -07:00

1 2 3 4 5 ...

674 Commits