sonic-buildimage

Archived

Author	SHA1	Message	Date
abdosi	668485aac5	Added Support to runtime render bgp and teamd feature state and lldp has_asic_scope flag (#11796 ) Added Support to runtime render bgp and teamd feature `state` and lldp `has_asic_scope` flag Needed for SONiC on chassis. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> Co-authored-by: mlok <marty.lok@nokia.com>	2022-11-15 16:20:14 -08:00
abdosi	bd348c5264	[chassis-packet] fix the issue of internal ip arp not getting resolved. (#12127 ) Fix the issue where arp_update will not ping some of the ip's even though they are in failed state since grep of that ip on ip neigh show command does not do exact word match and can return multiple match.	2022-11-14 10:15:17 -08:00
Lawrence Lee	ddf16c9d8c	[arp_update]: Fix hardcoded vlan (#12566 ) Typo in prior PR #11919 hardcodes Vlan name. Change command to use the $vlan variable instead Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-11-07 12:10:00 -08:00
Zain Budhwani	8f48773fd1	Publish additional events (#12563 ) Add event_publish code or regex for rsyslog plugin for additional events	2022-11-07 09:57:57 -08:00
Mai Bui	61a085e55e	Replace os.system and remove subprocess with shell=True (#12177 ) Signed-off-by: maipbui <maibui@microsoft.com> #### Why I did it `subprocess` is used with `shell=True`, which is very dangerous for shell injection. `os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content #### How I did it remove `shell=True`, use `shell=False` Replace `os` by `subprocess`	2022-11-04 10:48:51 -04:00
Stepan Blyshchak	e662008f72	[services] kill container on stop in warm/fast mode (#10510 ) - Why I did it To optimize stop on warm boot. - How I did it Added kill for containers	2022-09-19 19:34:33 +03:00
Ze Gan	016f671857	[docker-macsec]: Add dependencies of MACsec (#11770 ) Why I did it If the SWSS services was restarted, the MACsec service should also be restarted. Otherwise the data in wpa_supplicant and orchagent will not be consistent. How I did it Add dependency in docker-macsec.mk. How to verify it Manually check by 'sudo service swss restart'. The MACsec container should be started after swss, the syslog will look like Sep 8 14:36:29.562953 sonic INFO swss.sh[9661]: Starting existing swss container with HWSKU Force10-S6000 Sep 8 14:36:30.024399 sonic DEBUG container: container_start: BEGIN ... Sep 8 14:36:33.391706 sonic INFO systemd[1]: Starting macsec container... Sep 8 14:36:33.392925 sonic INFO systemd[1]: Starting Management Framework container... Signed-off-by: Ze Gan <ganze718@gmail.com>	2022-09-08 23:45:06 +08:00
Ying Xie	a6843927d9	[mux] skip mux operations during warm shutdown (#11937 ) * [mux] skip mux operations during warm shutdown - Enhance write_standby.py script to skip actions during warm shutdown. - Expand the support to BGP service. - MuX support was added by a previous PR. - don't skip action during warm recovery Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2022-09-02 13:50:42 -07:00
Lawrence Lee	a762b35cbc	[arp_update]: Set failed IPv6 neighbors to incomplete (#11919 ) After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-09-02 13:40:40 -07:00
Longxiang Lyu	6e878a36da	[mux] Exit to write `standby` state to `active-active` ports (#11821 ) [mux] Exit to write standby state to `active-active` ports Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2022-08-31 13:10:22 -07:00
abdosi	3bf1abb2dc	Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (#11857 ) As part of PR #11754 Change was added to use variable SONIC_DB_NS_CLI for namespace but that will not work since ./files/scripts/syncd_common.sh uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new variable for SONIC_GLOBAL_DB_CLI for global/host db cli access Also fixed DB_CLI not working for namespace.	2022-08-29 08:19:28 -07:00
Hua Liu	214e394ac0	Remove swsssdk from rules and image. (#11469 ) #### Why I did it To deprecate swsssdk, remove all dependency to it. #### How I did it Remove swsssdk from rules and build image scripts. #### How to verify it Pass all UT and E2E test case #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Remove swsssdk from rules and build image scripts. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)	2022-08-25 08:35:51 +08:00
abdosi	535612f808	Added support to add gbsyncd in Feature Table of Host Config DB (#11754 ) Why I did: In case of multi-asic platforms gbsyncd is not getting added to Feature Table of Host Config DB. Without this container_checker complains of not needed gbsyncd container's are running. How I did: Update Both Host and Namespace config db when gbsyncd docker is starting. How I verify: Verified on Multi-asic platforms.	2022-08-17 14:02:21 -07:00
Stepan Blyshchak	a66941a6ce	[syncd.sh] 'sxdkernel start' => 'sxdkernel restart' (#11718 ) Change `sxdkernel start` to `sxdkernel restart`. If `syncd` service crashes in `ExecStartPre` systemd will not call `ExecStop` and thus will not call `sxdkernel stop`. Use of `sxdkernel restart` is more robust in terms of guarantees to restore the system after unexpected crashes. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-15 13:35:34 -07:00
Nikola Dancejic	23dcfdf9b6	[swss] Adding conditional for bgp when on multi ASIC platform (#11691 ) bgp should be a per-asic service, and runs for each namespace on multi-asic platforms. However, putting bgp in MULTI_INST_DEPENDENT causes swss to be restarted as well as bgp. this is causing issues after #11000 Issue: #11653 This fix: removes bgp from dependents list adds a conditional that either adds bgp, or bgp@$DEV to separate between single and multi-asic platforms	2022-08-12 11:34:10 -07:00
Stepan Blyshchak	2d4299308d	[swss.sh/syncd.sh] Trap only on EXIT (#11590 ) When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing. I.e, the following script does not react on SIGTERM sent to it if it is waiting for sleep to finish: ``` trap "echo Handled SIGTERM" 0 2 3 15 echo "Before sleep" sleep inf echo "After sleep" ``` Instead, trap only on EXIT which covers also a scenario with exit on SIGINT, SIGTERM. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-10 20:57:07 -07:00
Lawrence Lee	889741c9bc	[arp_update]: Resolve failed neighbors on dualtor (#11615 ) In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-08-09 16:19:42 -07:00
Ying Xie	a3e3530d1d	[write_standby] update write_standby.py script (#11650 ) Why I did it The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay. However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started. So this script will have to write something at start up time. For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if link prober found link not in good state. How I did it Update the script to always provide initial values. How to verify it Tested on active-active dual-tor testbed. Signed-off-by: Ying Xie ying.xie@microsoft.com	2022-08-09 14:21:29 -07:00
Nikola Dancejic	8f6b568acf	[swss] Adding bgp container as dependent of swss (#11000 ) What I did: Added bgp as a dependent of swss Why I did it: bgp container was not restarting on swss crash. When swss crashes, linkmgrd doesn't initate a switchover because it cannot access the default route from orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd to initiate a switchover to the peer ToR avoiding significant packet loss. How I did it: Added bgp to DEPENDENT Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>	2022-07-29 16:22:20 -07:00
Stepan Blyshchak	925a393e3d	[swss.sh] clear counters cache folder on swss cold/fast reload (#11244 ) A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location. - Why I did it To fix #9817. Clear the cache directory on swss.sh except for warm start. Also, adopted finalize-warmboot script to take the cache directory. - How I did it A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location. - How to verify it Run togather with Azure/sonic-utilities#2232. Verify counters cache is removed on config reload, cold/fast reboots, swss restart. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-07-28 12:03:22 +03:00
abdosi	a380105461	Enable ARP Update Script for Packet based chassis. (#11465 ) What I did: Following changes done for packet based chassis:- 1> Run arp_update on LC's to resolve static route nexthops over backend port-channel interfaces. 2> On Supervisor make sure arp_update exit gracefully	2022-07-26 16:50:16 -07:00
Iris Hsu	f323f56c54	flush VRF_OBJECT_TABLE table on state db when swss start (#11509 ) *flush VRF_OBJECT_TABLE table on state db when swss start	2022-07-21 18:01:39 -07:00
Jing Zhang	5d03b5d0df	Avoid write_standby in warm restart context (#11283 ) Avoid write_standby in warm restart context. sign-off: Jing Zhang zhangjing@microsoft.com Why I did it In warm restart context, we should avoid mux state change. How I did it Check warm restart flag before applying changes to app db. How to verify it Ran write_standby in table missing, key missing, field missing scenarios. Did a warm restart, app db changes were skipped. Saw this in syslog: WARNING write_standby: Taking no action due to ongoing warmrestart.	2022-06-29 21:34:02 -07:00
Sudharsan Dhamal Gopalarathnam	9452095e25	[lldp]Fix lldp spawned after reboot when disabled (#11080 ) - Why I did it When LLDP is disabled through feature command, it gets spawned after reboot. - How I did it In syncd.sh check if the service is enabled before spawning automatically during cold reboot. - How to verify it Disable lldp feature. Perform cold reboot and verify its not spawned.	2022-06-22 03:11:41 +03:00
shlomibitton	1474ad76d8	[Mellanox] [pmon] Fix for PMON service not starting when restarting SWSS service after fast/warm reboot (#10901 ) - Why I did it Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform. Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again. - How I did it On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running. If it is running it means we are at the first boot and continue normally. If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent. - How to verify it Run fast/warm reboot. service swss restart. Observe PMON service starting. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2022-06-16 12:15:09 +03:00
judyjoseph	0b1ae9c43c	Cleanup macsec stateDB tables on restart (#11066 ) Clean macsec tables in STATE_DB on start	2022-06-09 15:32:24 -07:00
Lukas Stockner	c9b27cde71	[swss] Clear VXLAN tunnel table from State DB on startup (#10822 ) * When reloading config after crashes, VTEP interfaces are sometimes not created since the tunnel still exists in the STATE_DB. * Adding VXLAN_TUNNEL_TABLE to the list of tables to be cleaned in swss.sh fixes the problem.	2022-05-31 08:54:31 -07:00
shlomibitton	4ec3af86af	[Fastboot] Delay PMON service for better fastboot performance (#10567 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-02 10:44:17 +03:00
shlomibitton	1d84e0d7df	[Fastboot] Delay LLDP service for better fastboot performance (#10568 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567	2022-04-28 10:35:14 +03:00
Junhua Zhai	128d762af3	[gearbox] Add peer gbsyncd for swss if gearbox exists (#10504 ) Fix the issues #10501 and #9733 If having gearbox, we need: * add gbsyncd as a peer since swss also has dependency on gbsyncd * add service gbsyncd to FEATURE table if it is missing	2022-04-20 19:02:49 +08:00
Kostiantyn Yarovyi	bf4ab4a338	[Barefoot][Syncd] restart of the interface for cleaning txquee through which communication takes place between Sonic and openBMC (#9941 ) Why I did it improvement of starting barefoot SDK How I did it restart of the interface for cleaning txquee through which communication takes place between Sonic and openBMC How to verify it run sonic autorestart tests	2022-03-21 10:07:20 -07:00
Stepan Blyshchak	18d00dfbe7	[teamd.sh] kill teamd docker on warm shutdown for faster shutdown (#10219 ) This can save 6 sec for teamd LAG restoration - the time between: ``` Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1. Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP ``` - Why I did it Optimize warm boot. Specifically reduce the time needed for LAG restoration. - How I did it Kill teamd docker after graceful shutdown of teamd processes. - How to verify it Run warm reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-03-15 09:20:36 +02:00
Lawrence Lee	a50d1f1fc8	[write_standby]: Increase timeout to 60s (#10065 ) - Avoid scenarios where script times out before orchagent can establish IPinIP tunnel Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-02-24 14:55:45 -08:00
tbgowda	4e32f85a31	Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#9419 ) Why I did it Fixes #8980 partly. The corresponding changes in sonic-sairedis is here : Azure/sonic-sairedis#975 How I did it Include changes from both repos and build an image for verification. How to verify it Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level. Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>	2022-02-01 08:44:17 -08:00
Shi Su	4b357044b3	[bgpcfgd] Add bgpcfgd support to advertise routes (#9197 ) Why I did it Add bgpcfgd support to advertise routes. How I did it Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly. How to verify it Added unit tests in bgpcfgd and verify on KVM about route advertisement.	2021-11-29 23:17:57 -08:00
Lawrence Lee	6e1a477ce0	[mux]: Fix `mark_dhcp_packet` (#9373 ) - Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second. - Make the mark_dhcp_packet.py file executable - Also clean up mark_dhcp_packet.py - Remove unused imports - Fix spacing and line lengths to conform to PEP8 Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-29 12:04:06 -08:00
Brian O'Connor	002827f08e	[PINS] Add APPL_STATE_DB and response path log (#9082 ) - Add APPL_STATE_DB to database_config.json - Clear APPL_STATE_DB during SwSS container restarts - Add response path log file to logrotate config: responsepublisher.rec Co-authored-by: PINS Working Group <sonic-pins-subgroup@googlegroups.com>	2021-11-24 10:31:06 -08:00
Junhua Zhai	240596ec7d	[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9332 ) Why I did it Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'. How I did it All of platform specific gbsyncd dockers use a common name 'gbsyncd' Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker	2021-11-23 10:44:29 -08:00
Guohan Lu	f3faf6111b	Revert "[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286 )" This reverts commit `1d2a11bbb8`.	2021-11-19 10:10:55 -08:00
Junhua Zhai	1d2a11bbb8	[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286 ) Why I did it Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'. How I did it All of platform specific gbsyncd dockers use a common name 'gbsyncd' Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker	2021-11-17 23:49:49 -08:00
trzhang-msft	689c101095	update DHCP_PACKET_MARK schema (#9077 ) - update DHCP_PACKET_MARK schema in state_db - this is an update over PR: Add service mark_dhcp_packet to mux container #9015	2021-11-02 15:55:50 -07:00
Stepan Blyshchak	4ad5f2af3f	[swss.sh] fix an issue that dependent services are not read from a file (#8943 ) This is due to the SERVICE variable declared after reading a file #### Why I did it To fix an issue that dhcp_relay does not restart with swss. #### How I did it Fixed in the swss.sh script #### How to verify it sudo systemctl restart swss verify dhcp_relay restarts as well.	2021-10-26 19:01:30 -07:00
trzhang-msft	4e0c4fb832	Add service mark_dhcp_packet to mux container (#9015 ) - add a new service "mark_dhcp_packet" to mux container - apply packet marks on a per-interface basis in ebtables - write packet marks to "DHCP_PACKET_MARK" table in state_db	2021-10-26 14:10:13 -07:00
Nazarii Hnydyn	453346f8df	[teamd]: Send USR1/USR2 only to subscribers. (#8856 ) To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly	2021-10-26 09:12:07 -07:00
Sumukha Tumkur Vani	3971c20001	Flush RESTAPI_DB when config reload is performed (#9037 )	2021-10-22 11:45:19 -07:00
Lawrence Lee	d5834fcb1b	Merged PR 4679112: [write_standby]: Ignore non-auto interfaces [write_standby]: Ignore non-auto interfaces * In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	17cbfc44e6	Merged PR 4559560: [bgp]: Switch to standby if BGP container exits [bgp]: Switch mux to standby if BGP container exits Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	69bae5b27a	[write_standby]: Improve logging Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	5232647b33	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
byu343	50a9587e6e	[gbsyncd] Flush GB_ASIC_DB for gbsyncd cold restart (#8633 ) This is to flush the state in GB_ASIC_DB when running 'config reload'. Otherwise, the left state affects the cold restart of gbsyncd.	2021-08-31 15:52:48 -07:00

1 2 3

148 Commits