sonic-buildimage

Archived

Author	SHA1	Message	Date
Stepan Blyshchak	4ba0ff25d2	[services] make snmp.timer work again and delay telemetry.service (#3742 ) Delay CPU intensive services at boot - How I did it Made snmp.timer work and add telemetry.timer. But this is not enough because it breaks the existing snmp dependency on swss. So, in this solution snmp timer is a wanted by swss service, but since OnBootSec timer expires only once it will not trigger snmp service, so I added line "OnUnitActiveSec=0 sec" which will start snmp service based on the last time it was active. On boot only OnBootSec will expire, on swss start/restarts only second timer will expire immediately and trigger snmp service. However, snmp service will not stop after "systemctl stop snmp" because of the second timer which will always expire when snmp service because unavailable. So there is a conflict which will be handled by systemd if we add "Conflicts=" line to both snmp.service and snmp.timer. So during boot: snmp does not start by default swss starts and starts snmp timer OnUnitActiveSec=0 does not expire since there is no snmp active OnBootSec expires and starts snmp service and snmp timer gets stopped During "systemctl restart swss" snmp stops because of Requisite on swss snmp unblocks snmp timer from running swss starts and starts snmp timer OnUnitActiveSec=0 expires imidiately and start snmp which stops snmp timer During "systemctl stop snmp" stop of snmp service unblocks snmp timer but no one starts the timer so it is not started by "OnUnitActiveSec=0"	2019-12-16 09:07:05 -08:00
Ying Xie	9baf8f7c33	[swss service] flush fast-reboot enabled flag upon swss stopping (#3908 ) If we need to stop swss during fast-reboot procedure on the boot up path, it means that something went wrong, like syncd/orchagent crashed already, we are stopping and restarting swss/syncd to re-initialize. In this case, we should proceed as if it is a cold reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-16 07:58:16 -08:00
Renuka Manavalan	3ab4b71656	Corefile uploader service (#3887 ) * Corefile uploader service 1) A service is added to watch /var/core and upload to Azure storage 2) The service is disabled on boot. One may enable explicitly. 3) The .rc file to be updated with acct credentials and http proxy to use. 4) If service is enabled with no credentials, it would sleep, with periodic log messages 5) For any update in .rc, the service has to be restarted to take effect. * Remove rw permission for .rc file for group & others. * Changes per review comments. Re-ordered .rc file per JSON.dump order. Added a script to enable partial update of .rc, which HWProxy would use to add acct key. * Azure storage upload requires python module futures, hence added it to install list. * Removed trailing spaces. * A mistake in name corrected. Copy the .rc updater script to /usr/bin.	2019-12-15 16:48:48 -08:00
Stephen Sun	80bb7fd15a	[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot (#3880 ) * [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown. * [process-reboot-cause]Fix review comments * [process-reboot-cause]address comments 1. use "with" statement 2. update fast/warm reboot BOOT_ARG * [process-reboot-cause]address comments * refactor the code flow * Remove escape * Remove extra ':'	2019-12-14 09:41:48 -08:00
Ying Xie	eefa8455d7	[hostcfgd] avoid in place editing config file contents (#3904 ) In place editing (sed -i) seems having some issues with filesystem interaction. It could leave 0 size file or corrupted file behind. It would be safer to sed the file contents into a new file and switch new file with the old file. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-13 19:26:39 -08:00
rajendra-dendukuri	fec80293dd	ZTP infrastructure changes to support DHCP discovery provisioning data (#3298 ) * ZTP infrastructure changes to support DHCP discovery provisioning data - Dynamically generate DHCP client configuration based on current ZTP state - Added support to request and process hostname when using DHCPv6 - Do not process graphservice url dhcp option if ZTP is enabled, ZTP service will process it - Generate /e/n/i file with all active interfaces seeking address assignment via DHCP. Only interfaces that are created in Linux will be added to /e/n/i. Also DHCP is started only on linked up in-band interfaces. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2019-12-10 08:16:56 -08:00
pavel-shirshov	1848fb262b	[fast-reboot]: Save fast-reboot state into the db (#3741 ) Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.	2019-12-04 14:10:19 -08:00
rajendra-dendukuri	cda61290ac	[config-setup]: create a SONiC configuration management service (#3227 ) * Create a SONiC configuration management service * Perform config db migration after loading config_db.json to redis DB * Migrate config-setup post migration hooks on image upgrade config-setup post migration hooks help user to migrate configurations from old image to new image. If the installed hooks are user defined they will not be part of the newly installed image. So these hooks have to be migrated to new image and only then they can be executing when the new image is booting. The changes in this fix migrate config-setup post-migration hooks and ensure that any hooks with the same filename in newly installed image are not overwritten. It is expected that users install new hooks as per their requirement and not edit existing hooks. Any changes to existing hooks need to be done as part of new image and not post bootup.	2019-12-04 07:15:58 -08:00
rajendra-dendukuri	eec594adf2	[sonic-ztp]: Build sonic-ztp package (#3299 ) * Build sonic-ztp package - Add changes in make rules to conditionally include sonic-ztp package Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2019-12-04 04:50:56 -08:00
Joe LeVeque	100d67941a	[services] sflow service sets swss service as Requisite=, not Requires= (#3819 ) The sflow service should not start unless the swss service is started. However, if this service is not started, the sflow service should not attempt to start them, instead it should simply fail to start. Using Requisite=, we will achieve this behavior, whereas using Requires= will cause the required service to be started.	2019-12-03 09:50:49 -08:00
Ying Xie	fc36ca6e45	Revert "[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 )" (#3835 ) This reverts commit `351410ea8c`.	2019-12-02 15:54:55 -08:00
pra-moh	bfa96bbce3	Add daemon which periodically pushes process and docker stats to State DB (#3525 )	2019-11-27 15:35:41 -08:00
Joe LeVeque	5e6f8adb22	[services] Remove explicit dependencies from dhcp_relay service file, control in swss.sh (#3823 )	2019-11-26 16:59:45 -08:00
pra-moh	d3a1555f30	[hostcfgd] Add support to enable/disable optional features (#3653 )	2019-11-26 14:11:12 -08:00
yozhao101	67fc68513e	[Services] Restart Sflow service upon unexpected critical process exit. (#3751 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-25 13:02:00 -08:00
Joe LeVeque	351410ea8c	[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 ) 'systemctl start'	2019-11-22 20:39:09 -08:00
yozhao101	df11b2b9f1	[Services] Restart Telemetry service upon unexpected critical process exit. (#3768 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-18 16:56:44 -08:00
kannankvs	4007d9ba9c	[ntp]: modified ntp script to hide the error related to cfggen (#3745 ) This PR is to handle the issue 3527. When device boots up, NTP throws a traceback as explained in the issue 3527. - Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”. - Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback. - This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.	2019-11-14 00:06:54 -08:00
Joe LeVeque	c50c390eb4	[rsyslog] Add support for IPv6 remote addresses (#3754 )	2019-11-14 00:00:55 -08:00
Tyler Li	c07ae3b16f	Loopback ip addresses move to intfmgrd for supporting VRF	2019-11-10 02:27:33 -08:00
Joe LeVeque	85b0de3df1	[docker-syncd]: Restart SwSS, syncd and dependent services if a critical process in syncd container exits unexpectedly (#3534 ) Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.	2019-11-09 10:26:39 -08:00
Olivier Singla	c70d8bca9f	[baseimage]: kdump support (#3722 ) * In the event of a kernel crash, we need to gather as much information as possible to understand and identify the root cause of the crash. Currently, the kernel does not provide much information, which make kernel crash investigation difficult and time consuming. Fortunately, there is a way in the kernel to provide more information in the case of a kernel crash. kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. This PR will add kermel kdump support. An extension to the CLI utilities config and show is provided to configure and manage kdump: - enable / disable kdump functionality - configure kdump (how many kernel crash logs can be saved, memory allocated for capture kernel) - view kernel crash logs	2019-11-08 23:08:42 -08:00
Ying Xie	96fffd883d	Revert "[services] make snmp.timer work again and delay telemetry.service (#3657 )" (#3729 ) This reverts commit `d346cb3898`.	2019-11-08 21:44:25 -08:00
lguohan	6d46badbdc	[aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716 )	2019-11-06 20:18:31 -08:00
Neetha John	95466c3ab7	[pfcwd]: Do not start pfc watchdog on Management Tor (#3719 ) Signed-off-by: Neetha John <nejo@microsoft.com>	2019-11-06 18:51:02 -08:00
pavel-shirshov	d5af096f41	[TSA]: Add community to the loopback prefix, when isolated (#3708 ) * Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml * Fix bgp templates * Add community for loopback when bgpd is isolated * Use correct community value	2019-11-06 16:07:28 -08:00
Stepan Blyshchak	d346cb3898	[services] make snmp.timer work again and delay telemetry.service (#3657 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-11-06 12:12:31 -08:00
yozhao101	a117b25446	[Services] Restart LLDP service upon unexpected critical process exit. (#3713 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-06 11:02:57 -08:00
Samuel Angebault	05e659901f	[arista] Add support for more 7280CR3 variants (#3711 ) * Add extra Smartsville hwskus	2019-11-06 10:11:38 -08:00
yozhao101	ed79f54569	[Services] Restart DHCP-Relay service upon unexpected critical process exit. (#3667 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-05 18:32:14 -08:00
yozhao101	4c31ef3cd2	[Services] Restart Teamd service upon unexpected critical process exit. (#3703 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-04 17:45:41 -08:00
yozhao101	4fa3a1e27e	[Services] Restart Platform-monitor service upon unexpected critical process exit. (#3689 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-04 17:44:01 -08:00
Stepan Blyshchak	8dbe13c4cc	[services] improve startup time by changing startup order (#3656 ) * [services] improve startup time by given precedence to critical services (syncd.service) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-10-31 09:18:26 -07:00
yozhao101	cff30c59d0	[Services] Restart Router-advertiser service upon unexpected critical process exit (#3681 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-10-30 16:41:55 -07:00
Ying Xie	5961e031e1	[hostname-config] improve hostname-config process (#3676 ) We noticed in tests/production that there is a low probability failure where /etc/hosts could have some garbage characters before the entry for local host name. The consequence is that all sudo command would be very slow. In extreme cases it would prevent some services from starting properly. I suspect that the /etc/hosts file might be opened by some process causing the issue. Editing contents with new file level and replace the whole file should be safer. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-29 08:30:27 -07:00
Danny Allen	63328814fc	[core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659 ) Signed-off-by: Danny Allen <daall@microsoft.com>	2019-10-23 15:55:47 -07:00
yozhao101	a0fbeeaca5	[Services] Restart SNMP service upon unexpected critical process exit. (#3650 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-10-22 14:41:12 -07:00
Wenda Ni	be52977aca	Revert "Configure buffer profile to all ports (#3561 )" (#3628 ) This reverts commit `8861cbe98e`.	2019-10-18 09:14:39 -07:00
kannankvs	150ed36be2	[snmp]: changes to handle snmp configuration as per the modified CLI (#3586 ) While doing CLI changes for SNMP configuration, few changes are made in backend to handle the modified CLI. Changes - "community" for "snmp trap" is also made as "configurable". snmpd_conf.j2 is modified to handle the same. - Changed the snmp.yml file generation from postStartAction to preStartAction in docker_image_ctl.j2 specific to SNMP docker, to ensure that the snmp.yml is generated before sonic-cfggen generates the snmpd.conf. - Changed to make the code common for management vrf and default vrf. Users can configure snmp trap and snmp listening IP for both management vrf and default vrf.	2019-10-10 09:24:18 -07:00
pavel-shirshov	9b8f5c9c9a	[ntp]: Use loopback address when we don't have MGMT interface (#3566 ) Added configuration to use Loopback ip if a switch doesn't have MGMT_PORT.	2019-10-07 07:49:25 -07:00
Wenda Ni	8861cbe98e	Configure buffer profile to all ports (#3561 ) Signed-off-by: Wenda Ni <wenni@microsoft.com>	2019-10-04 11:20:57 -07:00
Ying Xie	cd85e2148b	[updategraph] enhance update graph handling (#3549 ) - after reloading minigraph, write latest version string in the DB. - if old config_db.json file exists, use it and migrate to latest version. - only reload minigraph when config_db.json doesn't exist and minigraph exists. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-02 13:58:44 -07:00
Ying Xie	d5262a3621	[first boot] sync file system after moving/copying files (#3550 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-10-02 13:58:34 -07:00
Wenda Ni	cf0465bf53	Adopt per-port buffer and qos profile (#3542 ) Signed-off-by: Wenda Ni <wenni@microsoft.com>	2019-10-02 13:01:16 -07:00
Stepan Blyshchak	52e35a0f95	[docker_image_ctl.j2] skip hostname update if is up to date (#3529 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-10-01 20:48:03 -07:00
Stephen Sun	7308d2eb97	[Mellanox] Stop pmon ahead of syncd (#3505 ) Issue Overview shutdown flow For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following: INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use. INFO syncd.sh[23597]: Unloading sx_core[FAILED] INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use config reload & service swss.restart In the flows like "config reload" and "service swss restart", the failure cause further consequences: sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16" syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE" swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases. reboot, warm-reboot & fast-reboot In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched. summary To summarize, we have to come up with a way to ensure: shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow; don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time. for "reboot" flow, either order is acceptable. Solution To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot. - How I did it To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence. Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline. This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait() To start pmon automatically after syncd started.	2019-09-27 10:15:46 +02:00
Stephen Sun	c34a4783e0	[build] install new platform api on host (#3282 ) slave.mk: add SONIC_PLATFORM_API_PY2 as dependency of host sonic_debian_extension.j2: install sonic_daemon_base and Mellanox-specific sonic_platform on host mlnx-platform-api.mk: export mlnx_platform_api_py2_wheel_path for sonic_debian_extension.j2 sonic-daemon-base.mk: export daemon_base_py2_wheel_path for sonic_debian_extension.j2 daemon_base.py: hind unnecessary dependency of swss_common on host	2019-09-25 11:00:24 -07:00
Long Ou	b6a09999de	[hostcfgd] hostcfgd will exit when set hostname in DEVICE_METADATA (#3394 ) Signed-off-by: ouxiaolong <ouxiaolong@asterfusion.com>	2019-09-24 17:36:02 -07:00
Harish Venkatraman	9d2d617264	[SNMP] management VRF SNMP support (#2608 ) * [SNMP] management VRF SNMP support This commit adds SNMP support for Management VRF using l3mdev. The patch included provides VRF support, there is no single "listendevice" configuration, rather multiple agentaddress config options can each have their own "interface" to bind to using "ip%interface". The snmpd.conf file is accordingly generated using the snmp.yml file and redis database info. Adding below the comments of SNMP patch 1376 -------------------------------------------- Since the Linux kernel added support for Virtual Routing and Forwarding (VRF) in version 4.3 (Note: these won't compile on non-linux platforms) https://www.kernel.org/doc/Documentation/networking/vrf.txt Linux users could not use snmpd in its current form to bind specific listening IP addresses to specific VRF devices. A simplified description of a VRF inteface is an interface that is a master (a container of sorts) that collects a set of physicalinterfaces to form a routing table. This set of two patches (one for V5-7-patches and one for V5-8-patches branches) is almost identical to patch single "listendevice" configuration. Rather, multiple agentAddress config options can each have their own "interface" to bind to using the <ip>%<interface> syntax.</interface></ip> ------------------------------------------- Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>	2019-09-18 17:26:45 -07:00
Prince Sunny	8ca1eb289e	Install Iptables rules to set TCPMSS for 'lo' interface (#3452 ) * Install Iptables rules to set TCPMSS for lo interface * Moved implementation to hostcfgd to maintain at one place	2019-09-18 10:12:28 -07:00

1 2 3 4 5 ...

460 Commits