sonic-buildimage

Author	SHA1	Message	Date
shlomibitton	c71c91e2b0	[202012] [Fastboot] Delay PMON service for better fastboot performance (#10745 ) #### Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. #### How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. #### How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 23:31:32 -07:00
shlomibitton	bca8a244c6	[202012] [Fastboot] Delay LLDP service for better fastboot performance (#10568 ) (#10744 ) This PR is to backport a fix #10568 This PR is dependent on PR: #10745 - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 15:05:29 +03:00
Lior Avramov	80b048d4d1	[systemd] Increase syncd startup script timeout to support FW upgrade on init. (#6709 ) - Why I did it To support FW upgrade on init. - How I did it Change timeout value - How to verify it I manually changed ASIC and Gearbox FW followed by hard reset in order for FW upgrade to take place on init. Signed-off-by: liora <liora@nvidia.com>	2021-02-16 15:31:10 -08:00
Syd Logan	0311a4a037	Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851 ) * buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature * scripts and configuration needed to support a second syncd docker (physyncd) * physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device * support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY). HLD is located at `b817a12fd8/doc/gearbox/gearbox_mgr_design.md` - Why I did it This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based on multi-switch support in sonic-sairedis. - How I did it Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point): https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged) https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged) https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib - How to verify it In a vslib build: root@sonic:/home/admin# show gearbox interfaces status PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin -------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ ------- 1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down 1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down 1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down In addition, docker ps \| grep phy should show a physyncd docker running. Signed-off-by: syd.logan@broadcom.com	2020-09-25 08:32:44 -07:00
Kebo Liu	f3091c91a6	[Mellanox] remove code which instructs hw-mgmt to skip mlsw_minimal probing in fast-boot flow (#5011 )	2020-07-22 12:21:11 +03:00
Kebo Liu	2b568ec136	Add with_i2cdev for mst start to have I2C device loaded properly (#4790 )	2020-06-21 16:27:05 +03:00
yozhao101	4ea2e5e6dc	[docker-syncd] Add timeout to force stop syncd container (#4617 ) - Why I did it When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be stopped as expected. However, I found sometimes syncd container can be stopped, sometimes it can not be stopped. The reason why syncd container can not be stopped is the process (/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process. 164 # wait until syncd quit gracefully 165 while docker top syncd$DEV \| grep -q /usr/bin/syncd; do 166 sleep 0.1 167 done The first thing I did is to profile how long this while loop will spin if syncd container can be normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd container can be normally stopped, two messages will be written into syslog: str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134 str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134 The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds: After that, the testing result is that syncd container can be normally stopped if swss is stopped first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future. - How I did it I added a timer around the while loop in stop() function. This while loop will exit after spinning 20 seconds. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-04 15:17:28 -07:00
judyjoseph	acf465b43b	Multi DB with namespace support, Introducing the database_global.json… (#4477 ) * Multi DB with namespace support, Introducing the database_global.json file for supporting accessing DB's in other namespaces for service running in linux host * Updates based on comments * Adding the j2 templates for database_config and database_global files. * Updating to retrieve the redis DIR's to be mounted from database_global.json file. * Additional check to see if asic.conf file exists before sourcing it. * Updates based on PR comments discussion. * Review comments update * Updates to the argument "-n" for namespace used in both context of parsing minigraph and multi DB access. * Update with the attribute "persistence_for_warm_boot" that was added to database_config.json file earlier. * Removing the database_config.json file to avioid confusion in future. We use the database_config.json.j2 file to generate database_config.json files dynamically. * Update the comments for sudo usage in docker_image_ctrl.j2 * Update with the new logic in PING PONG tests using sonic-db-cli. With this we wait till the PONG response is received when redis server is up. * Similar changes in swss and syncd scripts for the PING tests with sonic-db-cli * Updated with a missing , in the database_config.json.j2 file, Do pip install of j2cli in docker-base-buster.	2020-05-08 21:24:05 -07:00
Dong Zhang	340cf826a6	[MultiDB] use sonic-db-cli PING and fix wrong multiDB API in NAT (#4541 )	2020-05-06 15:41:28 -07:00
SuvarnaMeenakshi	4b8067e913	Multi-ASIC implementation (#3888 ) Changes made to support multi-asic platform. Added multi-instance support for swss, syncd, database, bgp, teamd and lldp.	2020-03-31 10:06:19 -07:00
Dong Zhang	7aa0baf709	[MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector (#4035 ) * [MultiDB] (except ./src and ./dockers dirs): replace redis-cli with sonic-db-cli and use new DBConnector * update comment for a potential bug * update comment * add TODO maker as review reqirement	2020-01-22 11:26:23 -08:00
lguohan	483a5946a8	Revert "[MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928 )" (#4002 ) This reverts commit `0dae59ac30`.	2020-01-10 08:27:34 -08:00
Dong Zhang	0dae59ac30	[MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector (#3928 ) * [MultiDB]except src and dockers : replace redis-cli with sonic-db-cli and use new DBConnector * fix vs tests along with swss vs tests together	2020-01-02 14:46:25 -08:00
Stepan Blyshchak	b6ad09aa35	[syncd.sh] remove chipdown on mellanox (#3926 ) ASIC reset events are captured by hw-mgmt and hw-mgmt calls chipup/chipdown internally without OS iteraction Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-12-23 11:15:08 +02:00
pavel-shirshov	1848fb262b	[fast-reboot]: Save fast-reboot state into the db (#3741 ) Put a flag for fast-reboot to the db using EXPIRE feature. Using this flag in other part of SONiC to start in Fast-reboot mode. If we reload a config, the state in the db will be removed.	2019-12-04 14:10:19 -08:00
Stephen Sun	7308d2eb97	[Mellanox] Stop pmon ahead of syncd (#3505 ) Issue Overview shutdown flow For any shutdown flow, which means all dockers are stopped in order, pmon docker stops after syncd docker has stopped, causing pmon docker fail to release sx_core resources and leaving sx_core in a bad state. The related logs are like the following: INFO syncd.sh[23597]: modprobe: FATAL: Module sx_core is in use. INFO syncd.sh[23597]: Unloading sx_core[FAILED] INFO syncd.sh[23597]: rmmod: ERROR: Module sx_core is in use config reload & service swss.restart In the flows like "config reload" and "service swss restart", the failure cause further consequences: sx_core initialization error with error message like "sx_core: create EMAD sdq 0 failed. err: -16" syncd fails to execute the create switch api with error message "syncd_main: Runtime error: :- processEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, status: SAI_STATUS_FAILURE" swss fails to call SAI API "SAI_SWITCH_ATTR_INIT_SWITCH", which causes orchagent to restart. This will introduce an extra 1 or 2 minutes for the system to be available, failing related test cases. reboot, warm-reboot & fast-reboot In the reboot flows including "reboot", "fast-reboot" and "warm-reboot" this failure doesn't have further negative effects since the system has already rebooted. In addition, "warm-reboot" requires the system to be shutdown as soon as possible to meet the GR time restriction of both BGP and LACP. "fast-reboot" also requires to meet the GR time restriction of BGP which is longer than LACP. In this sense, any unnecessary steps should be avoided. It's better to keep those flows untouched. summary To summarize, we have to come up with a way to ensure: shutdown pmon docker ahead of syncd for "config reload" or "service swss restart" flow; don't shutdown pmon docker ahead of syncd for "fast-reboot" or "warm-reboot" flow in order to save time. for "reboot" flow, either order is acceptable. Solution To solve the issue, pmon shoud be stopped ahead of syncd stopped for all flows except for the warm-reboot. - How I did it To stop pmon ahead of syncd stopped. This is done in /usr/local/bin/syncd.sh::stop() and for all shutdown sequence. Now pmon stops ahead of syncd so there must be a way in which pmon can start after syncd started. Another point that should be taken consideration is that pmon starting should be deferred so that services which have the logic of graceful restart in fast-reboot and warm-reboot have sufficient CPU cycles to meet their deadline. This is done by add "syncd.service" as "After" to pmon.service and startin /usr/local/bin/syncd.sh::wait() To start pmon automatically after syncd started.	2019-09-27 10:15:46 +02:00
pavel-shirshov	8facac9149	[Fast-Reboot]: FR mode is active only first 3 minutes after start. (#3352 ) * Fast reboot mode should be enabled only 3 minutes after restart * Advance sonic-quagga submodule	2019-08-19 16:05:20 -07:00
Stepan Blyshchak	6961816dec	fix fast reboot compatibility (#3083 ) * fix fast reboot compatibility We should handle both cases for backward-compatible with 201803: - fast-reboot - SONIC_BOOT_TYPE=fast-reboot * handle review comments * add a comment that getBootType code snippet is shared between two files	2019-06-26 12:46:58 -07:00
Nazarii Hnydyn	e041b15d10	[mellanox]: Fixed config reload race. (#2930 ) Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2019-05-29 09:57:29 +03:00
Joe LeVeque	2bb5400948	[services] Services which start containers now use 'docker wait' instead of 'docker attach' (#2661 )	2019-03-08 10:59:41 -08:00
Nazarii Hnydyn	b22fe37670	[mellanox]: Upgraded hw-management V.2.0.0160. (#2643 ) Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2019-03-06 18:51:46 -08:00
Joe LeVeque	5eb7872a07	[services] Ensure swss and syncd services start before dependent services (#2634 ) * [services] Ensure swss and syncd services start before dependent services * Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each * Add 'After=swss.service' to syncd.service	2019-03-02 15:28:34 -08:00
lguohan	572db1e0a9	[swss]: flush asic db in swss.sh for non warm-boot (#2582 ) need to flush asic db in swss.sh instead of syncd.sh orchagent might already started in swss.sh and put commands into asic db before asic db is flushed in syncd.sh. This causes race condition such as INIT_VIEW not passing to syncd. Signed-off-by: Guohan Lu <gulv@microsoft.com>	2019-02-19 21:48:43 -08:00
Jipan Yang	ff74daaf13	Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#2538 ) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>	2019-02-19 17:06:56 -08:00
Stepan Blyshchak	2dd769bf46	[syncd.sh] Don't stop sxdkernel during warm shutdown on Mellanox platform (#2572 ) /etc/init.d/sxdkernel stop may take up to 15 sec which has impact on control plane downtime Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-02-15 16:08:08 -08:00
Ying Xie	44551d0fb5	[swss/syncd] log swss/syncd service script activities (#2545 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2019-02-10 11:56:31 -08:00
stepanblyschak	ff526dd103	[mellanox\|ffb] use system level warm reboot for Mellanox fastfast boot (#2374 ) * [mellanox\|ffb] use system level warm reboot for Mellanox fastfast boot Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> * [mellanox\|ffb] add comments for mellanox start/stop drivers section Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2019-01-10 14:09:03 -08:00
Volodymyr Samotiy	b506241b84	[syncd]: Fix reload flow for Mellanox platforms (#2386 ) * Perform stop/start of Mellanox driver tools for all types of reboot * Don't set Mellanox FAST_BOOT option for "cold" reboot * Don't send "syncd_request_shutdown" event for "cold" reboot on Mellanox platforms Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>	2018-12-15 11:36:12 -08:00
Ying Xie	4abbe43463	[syncd] skip ledinit during syncd warm start (#2285 ) * [syncd] skip ledinit during syncd warm start Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-11-21 17:56:19 -08:00
Ying Xie	8598ccaf84	[syncd] extend syncd service script to support both warm/cold shutdown (#2238 ) - cold shutdown is used by regular service stop and/or fast reboot - warm shutdown is used by warm restart and/or warm reboot Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-11-15 15:47:33 -08:00
stepanblyschak	447ae7b61a	[mlnx] Fix fast reboot (#2237 ) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2018-11-09 21:54:20 -08:00
Ying Xie	f3ab8cdf9a	[warm boot] syncd warm start could be individual warm start (#2147 ) Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2018-10-16 11:20:39 -07:00
Kevin(Shengkai) Wang	ea4b4bd650	[mellanox]: Update recipe for hw-mgmt according to latest changes (#2128 ) Update the hw-mgmt to latest release V.2.0.0060. Update the related files according to the latest hw-mgmt. Signed-off-by: Kevin Wang <kevinw@mellanox.com>	2018-10-08 18:33:44 -07:00
Ying Xie	c8e6b15504	[syncd] warn shutdown syncd process when warm boot is enabled (#2078 ) * [syncd] warn shutdown syncd process when warm boot is enabled Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [warmboot] mount folder to hold warmboot temporary files Signed-off-by: Ying Xie <ying.xie@microsoft.com> * Fix a typo	2018-10-01 19:01:04 -07:00
Ying Xie	cfe01f19e4	Separate syncd service from swss service (#2051 ) * [swss.sh] refactor ssh service script code - Move checks and waits to helper functions. - Remove early returns from code stream Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [swss.sh] Add debug log for service state changes Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [syncd] Separate out syncd service from swss service Still make them start/stop/restart synchronously so existing scripts continue working. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * Remove extra 'After' in swss service and remove syncd docker warm boot code Syncd warm boot needs more thinking, we can put it back once the work flow has been defined and ready for coding/testing. * [syncd] syncd start/stop/restart shouldn't affect swss state Semi-detach syncd service state change from swss: - swss state change still chase syncd service to follow except warm boot - syncd state change will only affect itself. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * add missing '{'	2018-09-24 16:35:01 -07:00

35 Commits