sonic-buildimage

Author	SHA1	Message	Date
Dong Zhang	3faa4e936e	[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541 )	2020-05-09 18:16:48 -07:00
SuvarnaMeenakshi	2f66b4c545	[sonic-netns-exec]: use "$@" to reflects all positional parameters as they were set initially (#4375 ) sonic-netns-exec fails to execute below command in swss.sh: sonic-netns-exec "$NET_NS" sonic-db-cli $1 EVAL " local tables = {$2} for i = 1, table.getn(tables) do local matches = redis.call('KEYS', tables[i]) for j,name in ipairs(matches) do redis.call('DEL', name) end end" 0 This command fails with error " redis.exceptions.ResponseError: value is not an integer or out of range" . Root cause: When sonic-netns-exec executes the above function, argument passed to sonic-db-cli is NOT executed as a single script. The argument is passed as separate keywords to sonic-db-cli, as below: ['EVAL', 'local', 'tables', '=', "{'PORT_TABLE'}", 'for', 'i', '=', '1,', 'table.getn(tables)', 'do', 'local', 'matches', '=', "redis.call('KEYS',", 'tables[i])', 'for', 'j,name', 'in', 'ipairs(matches)', 'do', "redis.call('DEL',", 'name)', 'end', 'end', '0'] - How I did it To make sure that the parameters are passed as they were set initially, fix sonic-netns-exec to use double quoted "$@", where "$@" is "$1" "$2" "$3" ... "${N}" After fix, the argument passed to sonic-db-cli is as below: Argument passed to sonic-db-cli: ['EVAL', "\n local tables = {'PORT_TABLE'}\n for i = 1, table.getn(tables) do\n local matches = redis.call('KEYS', tables[i])\n for j,name in ipairs(matches) do\n redis.call('DEL', name)\n end\n end", '0'] Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>	2020-04-15 13:13:31 -07:00
SuvarnaMeenakshi	0099305475	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-04-15 13:08:34 -07:00
Abhishek Dosi	249265ad99	Revert "Multi-ASIC implementation (#3888 )" This reverts commit `2e87a16941`.	2020-04-03 14:34:38 -07:00
SuvarnaMeenakshi	2e87a16941	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-04-01 23:21:49 -07:00
Joe LeVeque	8e36068237	[sonic-cfggen] Loading the configuration from init_cfg.json and then from config_db.json (#4148 )	2020-03-15 08:54:05 -07:00
Prince Sunny	20510d58d3	Sleep done before mismatch handler (#4165 ) * Sleep done before mismatch handler	2020-02-24 10:25:56 -08:00
yozhao101	3ac345922b	[Services] Restart database service upon unexpected critical process exit. (#4138 ) * [database] Implement the auto-restart feature for database container. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] Remove the duplicate dependency in service files. Since we already have updategraph ---> config_setup ---> database, we do not need explicitly add database.service in all other container service files. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Reorganize the line 73 in event listener script. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [database] update the file sflow.service.j2 to remove the duplicate dependency. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add comments in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Update the comments in line 56. Signed-off-by: Yong Zhao <yozhao@microsoft.com> * [event listener] Add parentheses for if statement in line 76 in event listener. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-02-13 16:20:38 -08:00
yozhao101	71225ea4cc	[Service] Enable/disable container auto-restart based on configuration. (#4073 )	2020-02-13 16:20:21 -08:00
Prince Sunny	e87f27050b	Update arp_update to refresh neighbor entries from APP_DB (#4125 )	2020-02-13 16:05:19 -08:00
Dong Zhang	42bffc1215	[MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector (#4035 ) * [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * update comment for a potential bug * update comment * add TODO maker as review reqirement	2020-02-03 15:36:55 -08:00
pavel-shirshov	74b45be487	[fast-reboot]: Save fast-reboot state into the db (#3741 ) Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.	2020-01-06 10:30:36 -08:00
Ying Xie	df81943ec5	Revert "[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 )" (#3835 ) This reverts commit `351410ea8c`.	2020-01-02 14:35:55 -08:00
Stepan Blyshchak	3474e8fddd	[syncd.sh] remove chipdown on mellanox (#3926 ) ASIC reset events are captured by hw-mgmt and hw-mgmt calls chipup/chipdown internally without OS iteraction Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-12-31 14:43:32 -08:00
Ying Xie	2c7a01a421	[swss service] flush fast-reboot enabled flag upon swss stopping (#3908 ) If we need to stop swss during fast-reboot procedure on the boot up path, it means that something went wrong, like syncd/orchagent crashed already, we are stopping and restarting swss/syncd to re-initialize. In this case, we should proceed as if it is a cold reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-12-18 11:20:45 -08:00
Joe LeVeque	5e6f8adb22	[services] Remove explicit dependencies from dhcp_relay service file, control in swss.sh (#3823 )	2019-11-26 16:59:45 -08:00
Joe LeVeque	351410ea8c	[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807 ) 'systemctl start'	2019-11-22 20:39:09 -08:00
Joe LeVeque	85b0de3df1	[docker-syncd]: Restart SwSS, syncd and dependent services if a critical process in syncd container exits unexpectedly (#3534 ) Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.	2019-11-09 10:26:39 -08:00
yozhao101	ed79f54569	[Services] Restart DHCP-Relay service upon unexpected critical process exit. (#3667 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2019-11-05 18:32:14 -08:00
Stephen Sun	7308d2eb97	[Mellanox] Stop pmon ahead of syncd (#3505 ) Issue Overview shutdown flow For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following: INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use. INFO syncd.sh[23597]: Unloading sx_core[FAILED] INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use config reload & service swss.restart In the flows like "config reload" and "service swss restart", the failure cause further consequences: sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16" syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE" swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases. reboot, warm-reboot & fast-reboot In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched. summary To summarize, we have to come up with a way to ensure: shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow; don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time. for "reboot" flow, either order is acceptable. Solution To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot. - How I did it To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence. Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline. This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait() To start pmon automatically after syncd started.	2019-09-27 10:15:46 +02:00
Danny Allen	97c675c6d5	[cron.d] Add cron job to periodically clean-up core files (#3449 ) * [cron.d] Create cron job to periodically clean-up core files * Create script to scan /var/core and clean-up older core files * Create cron job to run clean-up script Signed-off-by: Danny Allen <daall@microsoft.com> * Update interval for running cron job * Respond to feedback * Change syslog id	2019-09-13 10:50:31 -07:00
pavel-shirshov	8facac9149	[Fast-Reboot]: FR mode is active only first 3 minutes after start. (#3352 ) * Fast reboot mode should be enabled only 3 minutes after restart * Advance sonic-quagga submodule	2019-08-19 16:05:20 -07:00
Ying Xie	84b667fbaf	[radv service] radv service should be a cold only dependent of swss (#3348 ) radv should be left alone during warm restart of swss. Otherwise it will announce departure and cause hosts to lose default gateway. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-08-16 12:08:46 -07:00
Ying Xie	a46df66d05	[service dependent] describe non-warm-reboot dependency outside systemd (#3311 ) * [service dependent] describe non-warm-reboot dependency outside systemctl When dependency was described with systemctl, it will kick in all the time, including under warm reboot/restart scenarios. This is not what we always want. For components that are capable of warm reboot/start, they need to describe dependency in service files. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [service] teamd service should not require swss service Adding require swss will cause teamd to be killed by systemctl when swss stops. This is not what we want in warm reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * refactoring code * rename functions to match other functions in the file	2019-08-08 15:45:17 -07:00
Stepan Blyshchak	59117d23f0	[swss.sh]: Cleanup LAG entries in STATE DB (#3114 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-07-08 17:29:57 -07:00
Stepan Blyshchak	6961816dec	fix fast reboot compatibility (#3083 ) * fix fast reboot compatibility We should handle both cases for backward-compatible with 201803: - fast-reboot - SONIC_BOOT_TYPE=fast-reboot * handle review comments * add a comment that getBootType code snippet is shared between two files	2019-06-26 12:46:58 -07:00
Prince Sunny	231d309b69	Generate interface table to have an entry designated to default VRF. (#2848 ) * Generate default VRF table for router interfaces * Updated jinja2 template to have prefix filter	2019-06-10 14:02:55 -07:00
Nazarii Hnydyn	e041b15d10	[mellanox]: Fixed config reload race. (#2930 ) Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2019-05-29 09:57:29 +03:00
Stepan Blyshchak	9523e64666	[swss.sh] flush FDB table during cold start (#2933 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-05-22 22:07:29 -07:00
Joe LeVeque	6eca27e564	[services] Restart SwSS service upon unexpected critical process exit (#2845 ) * [service] Restart SwSS Docker container if orchagent exits unexpectedly * Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes * Move supervisor-proc-exit-listener script * [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB * Ensure dependent services stop/start/restart with SwSS * Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230) * Also update journald.conf options * Remove 'PartOf' option from unit files * Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile * Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container * Update critical_processes file for swss container	2019-05-01 08:02:38 -07:00
Joe LeVeque	2bb5400948	[services] Services which start containers now use 'docker wait' instead of 'docker attach' (#2661 )	2019-03-08 10:59:41 -08:00
Nazarii Hnydyn	b22fe37670	[mellanox]: Upgraded hw-management V.2.0.0160. (#2643 ) Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2019-03-06 18:51:46 -08:00
Ying Xie	66f5202b9f	[swss/syncd] cold start syncd service in swss in attach method (#2639 ) start() is called by service startPre method, which is blocking. Starting syncd service here is causing deadlock. attach() is called by service start method, which is non-blocking. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-03-04 16:46:55 -08:00
Joe LeVeque	5eb7872a07	[services] Ensure swss and syncd services start before dependent services (#2634 ) * [services] Ensure swss and syncd services start before dependent services * Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each * Add 'After=swss.service' to syncd.service	2019-03-02 15:28:34 -08:00
lguohan	572db1e0a9	[swss]: flush asic db in swss.sh for non warm-boot (#2582 ) need to flush asic db in swss.sh instead of syncd.sh orchagent might already started in swss.sh and put commands into asic db before asic db is flushed in syncd.sh. This causes race condition such as INIT_VIEW not passing to syncd. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2019-02-19 21:48:43 -08:00
Jipan Yang	ff74daaf13	Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#2538 ) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2019-02-19 17:06:56 -08:00
Stepan Blyshchak	2dd769bf46	[syncd.sh] Don't stop sxdkernel during warm shutdown on Mellanox platform (#2572 ) /etc/init.d/sxdkernel stop may take up to 15 sec which has impact on control plane downtime Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-02-15 16:08:08 -08:00
Ying Xie	44551d0fb5	[swss/syncd] log swss/syncd service script activities (#2545 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-02-10 11:56:31 -08:00
Prince Sunny	39e12a1d82	[swss]: Change VrfMgrd startup order, cleanup VRF_TABLE from state DB (#2510 )	2019-01-31 23:28:31 -08:00
stepanblyschak	ff526dd103	[mellanox\|ffb] use system level warm reboot for Mellanox fastfast boot (#2374 ) * [mellanox\|ffb] use system level warm reboot for Mellanox fastfast boot Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [mellanox\|ffb] add comments for mellanox start/stop drivers section Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-01-10 14:09:03 -08:00
Volodymyr Samotiy	b506241b84	[syncd]: Fix reload flow for Mellanox platforms (#2386 ) * Perform stop/start of Mellanox driver tools for all types of reboot * Don't set Mellanox FAST_BOOT option for "cold" reboot * Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>	2018-12-15 11:36:12 -08:00
Volodymyr Samotiy	75b41233d2	[Mellanox\|FFB]: Add support for Mellanox fast-fast boot (#2294 ) * [mlnx\|ffb] Add support for mellanox fast-fast boot Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [mlnx\|ffb]: Add support of "config end" event for mlnx fast-fast boot Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com> * [Mellanox\|FFB]: Fix review comments * Change naming convention from "fast-fast" to "fastfast" Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>	2018-12-04 10:11:24 -08:00
Ying Xie	4abbe43463	[syncd] skip ledinit during syncd warm start (#2285 ) * [syncd] skip ledinit during syncd warm start Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-11-21 17:56:19 -08:00
Ying Xie	5c8650aaaa	[swss service] don't clear WARM_RESTART table (#2256 ) Clear WARM_RESTART table could cause component level warm restart to fail due to missing WARM_RESTART state. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-11-15 22:04:53 -08:00
Ying Xie	8598ccaf84	[syncd] extend syncd service script to support both warm/cold shutdown (#2238 ) - cold shutdown is used by regular service stop and/or fast reboot - warm shutdown is used by warm restart and/or warm reboot Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-11-15 15:47:33 -08:00
stepanblyschak	447ae7b61a	[mlnx] Fix fast reboot (#2237 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2018-11-09 21:54:20 -08:00
Shuotian Cheng	110355201b	[swss]: Update swss.sh script to clean up specific db when start (#2223 ) This script shall not flush all the entries in the state database when it starts up, since there are entries maintained and written by other processes outside this docker. The issue we noticed was that the portchannel states are cleaned up after teamsyncd writes the entries into the database, which causes the IPs failed to be configured because intfmgrd considers the portchannels are not ready yet. Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>	2018-11-03 12:32:46 -07:00
Ying Xie	f3ab8cdf9a	[warm boot] syncd warm start could be individual warm start (#2147 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-10-16 11:20:39 -07:00
Kevin(Shengkai) Wang	ea4b4bd650	[mellanox]: Update recipe for hw-mgmt according to latest changes (#2128 ) Update the hw-mgmt to latest release V.2.0.0060. Update the related files according to the latest hw-mgmt. Signed-off-by: Kevin Wang <kevinw@mellanox.com>	2018-10-08 18:33:44 -07:00
Jipan Yang	dedd5624a0	Adapt to the new WARM_RESTART_TABLE table schema: change from restart… (#2083 ) * Adapt to the new WARM_RESTART_TABLE table schema: change from restart_count to restore_count Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Update variable and function name to match restore_count name change Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> * Update swss submodule for warm restart schema change Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2018-10-02 06:08:26 -07:00

1 2

59 Commits