sonic-buildimage

Archived

Author	SHA1	Message	Date
Lawrence Lee	275adc6691	[arp_update]: Fix hardcoded vlan (#12566 ) Typo in prior PR #11919 hardcodes Vlan name. Change command to use the $vlan variable instead Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-11-11 18:01:15 +00:00
Ying Xie	64ce6696bb	[mux] skip mux operations during warm shutdown (#11937 ) * [mux] skip mux operations during warm shutdown - Enhance write_standby.py script to skip actions during warm shutdown. - Expand the support to BGP service. - MuX support was added by a previous PR. - don't skip action during warm recovery Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2022-10-03 22:30:55 +00:00
Longxiang Lyu	893391f76e	[mux] Exit to write `standby` state to `active-active` ports (#11821 ) [mux] Exit to write standby state to `active-active` ports Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2022-10-03 19:52:51 +00:00
siqbal1986	10b6a5c402	[202012] StateDB table cleanup for VNET routes (#11999 ) * added cleanup of tables. 'VNET_ROUTE_TUNNEL_TABLE', 'VNET_ROUTE_TABLE'	2022-09-08 14:57:20 -07:00
Lawrence Lee	e821dd8551	[arp_update]: Set failed IPv6 neighbors to incomplete (#11919 ) After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-09-02 21:57:47 +00:00
Ying Xie	4ab83170a5	[write_standby] update write_standby.py script (#11650 ) Why I did it The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay. However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started. So this script will have to write something at start up time. For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if link prober found link not in good state. How I did it Update the script to always provide initial values. How to verify it Tested on active-active dual-tor testbed. Signed-off-by: Ying Xie ying.xie@microsoft.com	2022-09-01 23:57:23 +00:00
Jing Zhang	9d3194c77a	Avoid write_standby in warm restart context (#11283 ) Avoid write_standby in warm restart context. sign-off: Jing Zhang zhangjing@microsoft.com Why I did it In warm restart context, we should avoid mux state change. How I did it Check warm restart flag before applying changes to app db. How to verify it Ran write_standby in table missing, key missing, field missing scenarios. Did a warm restart, app db changes were skipped. Saw this in syslog: WARNING write_standby: Taking no action due to ongoing warmrestart.	2022-09-01 23:57:17 +00:00
Stepan Blyshchak	8ab448a852	[swss.sh/syncd.sh] Trap only on EXIT (#11590 ) When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing. I.e, the following script does not react on SIGTERM sent to it if it is waiting for sleep to finish: ``` trap "echo Handled SIGTERM" 0 2 3 15 echo "Before sleep" sleep inf echo "After sleep" ``` Instead, trap only on EXIT which covers also a scenario with exit on SIGINT, SIGTERM. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-08-11 20:38:20 +00:00
Lawrence Lee	04ba6da1ab	[202012][arp_update]: Resolve failed neighbors on dualtor (#11641 ) In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-08-05 23:30:04 -07:00
Nikola Dancejic	c5a5734242	[swss] Adding bgp container as dependent of swss (#11168 ) What I did: Added bgp as a dependent of swss Why I did it: bgp container was not restarting on swss crash. When swss crashes, linkmgrd doesn't initate a switchover because it cannot access the default route from orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd to initiate a switchover to the peer ToR avoiding significant packet loss. Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>	2022-07-29 09:37:09 -07:00
Lukas Stockner	ab10005729	[swss] Clear VXLAN tunnel table from State DB on startup (#11078 ) *Clear VXLAN tunnel table from State DB on startup Signed-off-by: Lukas Stockner <lstockner@genesiscloud.com>	2022-06-10 11:50:56 -07:00
shlomibitton	2a9aa0836c	[202012] [Mellanox] [pmon] Fix for PMON service not starting when restarting SWSS service after fast/warm reboot (#10902 ) - Why I did it Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform. Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again. - How I did it On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running. If it is running it means we are at the first boot and continue normally. If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent. - How to verify it Run fast/warm reboot. service swss restart. Observe PMON service starting.	2022-06-08 09:46:54 +03:00
shlomibitton	c71c91e2b0	[202012] [Fastboot] Delay PMON service for better fastboot performance (#10745 ) #### Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. #### How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. #### How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 23:31:32 -07:00
shlomibitton	bca8a244c6	[202012] [Fastboot] Delay LLDP service for better fastboot performance (#10568 ) (#10744 ) This PR is to backport a fix #10568 This PR is dependent on PR: #10745 - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 15:05:29 +03:00
Stepan Blyshchak	fa1e364f54	[services] kill container on stop in warm/fast mode (#10511 ) To optimize stop on warm boot, added kill for containers Use service "kill" in the shutdown path for fast and warm reboot. For all other reload methods, service "stop" is used. This is done to save time in shutdown path, and to overall improve the time spent in warm and fast reload. How - Use service_mgmt.sh to trigger common logic to initiate kill (fast/warm) or stop (cold) for database.sh, radv.sh, snmp.sh, telemetry.sh, mgmt-framework.sh Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>, Vaibhav H D <vaibhav.dixit@microsoft.com>	2022-04-18 14:27:48 -07:00
Stepan Blyshchak	8ce5e4e77b	[teamd.sh] kill teamd docker on warm shutdown for faster shutdown (#10219 ) This can save 6 sec for teamd LAG restoration - the time between: ``` Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1. Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP ``` - Why I did it Optimize warm boot. Specifically reduce the time needed for LAG restoration. - How I did it Kill teamd docker after graceful shutdown of teamd processes. - How to verify it Run warm reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-03-16 22:22:26 +00:00
Lawrence Lee	4d1abbc09b	[write_standby]: Increase timeout to 60s (#10065 ) - Avoid scenarios where script times out before orchagent can establish IPinIP tunnel Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-03-01 22:49:17 +00:00
tbgowda	78dc2d8a7b	Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#9419 ) Why I did it Fixes #8980 partly. The corresponding changes in sonic-sairedis is here : Azure/sonic-sairedis#975 How I did it Include changes from both repos and build an image for verification. How to verify it Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level. Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>	2022-02-08 19:07:08 +00:00
Shi Su	4191889803	[bgpcfgd] Add bgpcfgd support to advertise routes (#9197 ) (#9697 ) Why I did it Cherry pick changes in #9197 to 202012 branch Add bgpcfgd support to advertise routes. How I did it Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly. How to verify it Added unit tests in bgpcfgd and verify on KVM about route advertisement.	2022-01-26 14:38:04 -08:00
Lawrence Lee	b3a3aa0c38	[mux]: Fix `mark_dhcp_packet` (#9373 ) - Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second. - Make the mark_dhcp_packet.py file executable - Also clean up mark_dhcp_packet.py - Remove unused imports - Fix spacing and line lengths to conform to PEP8 Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-12-01 02:28:56 +00:00
trzhang-msft	19008889de	update DHCP_PACKET_MARK schema (#9077 ) - update DHCP_PACKET_MARK schema in state_db - this is an update over PR: Add service mark_dhcp_packet to mux container #9015	2021-11-15 21:37:08 +00:00
trzhang-msft	86fa5eede2	Add service mark_dhcp_packet to mux container (#9015 ) - add a new service "mark_dhcp_packet" to mux container - apply packet marks on a per-interface basis in ebtables - write packet marks to "DHCP_PACKET_MARK" table in state_db	2021-11-15 21:36:29 +00:00
Lawrence Lee	f317d93cb0	Merged PR 4679112: [write_standby]: Ignore non-auto interfaces [write_standby]: Ignore non-auto interfaces * In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	57ad50cfd9	Merged PR 4559560: [bgp]: Switch to standby if BGP container exits [bgp]: Switch mux to standby if BGP container exits Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	6a9c709336	[write_standby]: Improve logging Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	25712c712e	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Sumukha Tumkur Vani	65626c8925	Flush RESTAPI DB upon config reload (#9093 )	2021-10-28 09:31:38 -07:00
Nazarii Hnydyn	0cbda8d362	[teamd]: Send USR1/USR2 only to subscribers. (#8856 ) To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly	2021-10-27 03:54:58 +00:00
Vladyslav Morokhovych	754378f1d8	[swss] Fix arp_update script (#8412 ) Fix #7968 Issue is detected on SONiC.20201231.11 In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes. It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured. When issue is reproduced, stuck ping6 process is observed in swss container : USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 180 0.1 0.0 6296 1272 pts/0 S 17:03 0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1 And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process	2021-08-12 23:25:28 -07:00
Longxiang Lyu	25f53289eb	[swss][arp_update] Send ipv6 pings over vlan sub interfaces (#8363 ) #### Why I did it * `arp_update` fails to ping those neighbors over vlan sub interfaces. #### How I did it * modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned. * modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2021-08-07 12:43:51 +00:00
Prince Sunny	7a816ed5fa	[Mux] Do not clean-up HW_MUX_CABLE_TABLE from State DB (#7710 ) Co-authored-by: Ubuntu <prsunny@prince-vm.vzw1i4tqyeburcdz5lrgulxi2c.yx.internal.cloudapp.net>	2021-05-27 22:30:06 +00:00
yozhao101	7748597fa2	[Supervisord] Deduplicate the alerting messages of critical processes from Supervisord. (#6849 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it In the configuration of rsyslog, duplicate messages will be suppressed and reported in the format of message repeated n times. Due to this behavior, if a critical process in a container exited unexpectedly, the alerting message will be written into syslog once and not be written into syslog anymore until the second critical process exited. This PR aims to differentiate these alerting messages such that they will not be suppressed by rsyslogd and can appear in the syslog periodically. How I did it This PR adds a counter into the alerting message and shows how many minutes a critical process was not running. How to verify it I verified and test this implementation on a physical DUT.	2021-03-04 21:23:05 +00:00
shlomibitton	6361d36fb2	Stop teamd service before syncd (#6755 ) - What I did All SWSS dependent services should stop before SWSS service to avoid future possible issues. For example 'teamd' service will stop before to allow the driver unload netdev gracefully. This is to stop all LAG's before restarting syncd service when running 'config reload' command. - How I did it Change the order of dependent services of SWSS. - How to verify it Run 'config reload' command. Previously the operation failed when a large number of PortChannel configured on the system. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2021-02-16 15:33:10 -08:00
Lawrence Lee	e0efbc1e14	[swss]: Clear MUX-related state DB tables on start (#6759 ) * Add MUX_CABLE_TABLE to set of tables to clear on SWSS start, which will clear HW_MUX_CABLE_TABLE and MUX_CABLE_TABLE * Order swss to start before pmon to ensure that DBs are cleared before xcvrd (running inside pmon) starts and re-populates the tables Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-02-16 15:33:03 -08:00
Lior Avramov	80b048d4d1	[systemd] Increase syncd startup script timeout to support FW upgrade on init. (#6709 ) - Why I did it To support FW upgrade on init. - How I did it Change timeout value - How to verify it I manually changed ASIC and Gearbox FW followed by hard reset in order for FW upgrade to take place on init. Signed-off-by: liora <liora@nvidia.com>	2021-02-16 15:31:10 -08:00
Guohan Lu	bab136fc8f	[proc-exit-listener]: fix syntax error the bug is introduced in commit `34cca20c` Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-02-03 10:46:07 -08:00
Guohan Lu	f00bb52f7c	[proc-exit-listener]: ignore blank lines make proc-exit-listener more rebust Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-01-28 09:28:52 -08:00
yozhao101	cc9c3f567e	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-28 09:28:27 -08:00
mprabhu-nokia	41012f791e	In modular chassis, add CHASSIS_STATE_DB on control card (#5624 ) HLD: Azure/SONiC#646 In modular chassis, add CHASSIS_STATE_DB on control card Why I did it Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/ How I did it Adding another DB on an existing REDIS instance running on port 6380.	2020-12-15 17:15:00 -08:00
Stephen Sun	e010d83fc3	[Dynamic buffer calc] Support dynamic buffer calculation (#6194 ) - Why I did it To support dynamic buffer calculation. This PR also depends on the following PRs for sub modules - [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338) - [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361) - [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973) - How I did it 1. Introduce field `buffer_model` in `DEVICE_METADATA\|localhost` to represent which buffer model is running in the system currently: - `dynamic` for the dynamic buffer calculation model - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used 2. Add the tables required for the feature: - ASIC_TABLE in platform/\<vendor\>/asic_table.j2 - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2 - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed. - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2 - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2 3. Copy the newly introduced j2 files into the image and rendering them when the system starts 4. Update the CLI options for buffermgrd so that it can start with dynamic mode 5. Fetches the ASIC vendor name in orchagent: - fetch the vendor name when creates the docker and pass it as a docker environment variable - `buffermgrd` can use this passed-in variable 6. Clear buffer related tables from STATE_DB when swss docker starts 7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2 8. Remove buffer pool sizes for ingress pools and egress_lossy_pool Update the buffer settings for dynamic buffer calculation	2020-12-13 11:35:39 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
abdosi	fad481edc1	Enhanced Feature table to support 'always_enabled' value for state and auto-restart fields. (#6000 ) Added new flag value 'always_enabled' for the state and auto-restart field of feature table init_cfg.json is updated to initialize state field of database/swss/syncd/teamd feature and auto-restart field of database feature as always_enabled Once the state/auto-restart value is initialized as "always_enabled" it is immutable and cannot be change via feature config commands. (config feature..) PR#Azure/sonic-utilities#1271 hostcfgd will not take any action if state field value is 'always_enabled' Since we have always_enabled field for auto-restart updated supervisor-proc-exit-listener not to have special check for database and always rely on value from Feature table.	2020-11-25 08:41:11 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
heidinet2007	7c17c58b83	Move teamd warm reboot code to service script (#5163 ) Summary: Move teamd functions to a new service script Motivation: To segregate teamd functions in one common place. fast-reboot script calls teamd functions that should ideally be replaced by a simple call to a service script. Changes: New teamd service script and path modification from /usr/bin/teamd.sh to /usr/local/bin/teamd.sh fast-reboot script (in sonic-utilities) modification (to use new teamd.sh to stop teamd) should follow soon after this change. Verification: VS image tests. Signed-off-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Co-authored-by: heidi.ou@alibaba-inc.com <heidi.ou@alibaba-inc.com> Co-authored-by: Ying Xie <ying.xie@microsoft.com>	2020-11-13 13:34:18 -08:00
Joe LeVeque	e0fdf45ad0	[update_chassisdb_config] Convert to Python 3 (#5838 ) - Convert update_chassisdb_config script to Python 3 - Reorganize imports per PEP8 standard - Two blank lines precede functions per PEP8 standard	2020-11-09 08:35:36 -08:00
Joe LeVeque	522a071ffb	[core_cleanup.py] Convert to Python 3; Fix bug; Improve code reuse (#5781 ) - Convert to Python 3 - Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` instead - Remove locally-define logging functions; use Logger class from sonic-py-common instead	2020-11-05 10:01:12 -08:00
judyjoseph	ace7f24cba	[docker-teamd]: Add teamd as a depedent service to swss (#5628 ) - Why I did it On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. - How I did it Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service - How to verify it Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( confirmed swss don't exit here ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios	2020-10-23 00:41:16 -07:00
BrynXu	a2e3d2fcea	[ChassisDB]: bring up ChassisDB service (#5283 ) bring up chassisdb service on sonic switch according to the design in Distributed Forwarding in VoQ Arch HLD Signed-off-by: Honggang Xu <hxu@arista.com> - Why I did it To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](`90c1289eaf/doc/chassis/architecture.md`). - How I did it Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD. - How to verify it ChassisDB service won't start without chassisdb.conf file on the existing platforms. ChassisDB service is accessible with global.conf file in the distributed arichitecture. Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-14 15:15:24 -07:00
anish-n	e15e6a8313	[config-reload]: Add logic to clean up FG_ROUTE state db table during reload (#5518 ) Cleanup FG_ROUTE state db table during reload	2020-10-02 09:25:29 -07:00
Syd Logan	0311a4a037	Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851 ) * buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature * scripts and configuration needed to support a second syncd docker (physyncd) * physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device * support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY). HLD is located at `b817a12fd8/doc/gearbox/gearbox_mgr_design.md` - Why I did it This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based on multi-switch support in sonic-sairedis. - How I did it Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point): https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged) https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged) https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib - How to verify it In a vslib build: root@sonic:/home/admin# show gearbox interfaces status PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin -------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ ------- 1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down 1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down 1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down In addition, docker ps \| grep phy should show a physyncd docker running. Signed-off-by: syd.logan@broadcom.com	2020-09-25 08:32:44 -07:00

1 2 3

122 Commits