* [mux] skip mux operations during warm shutdown
- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Why I did it
The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay.
However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started.
So this script will have to write something at start up time.
For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if
link prober found link not in good state.
How I did it
Update the script to always provide initial values.
How to verify it
Tested on active-active dual-tor testbed.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Avoid write_standby in warm restart context.
sign-off: Jing Zhang zhangjing@microsoft.com
Why I did it
In warm restart context, we should avoid mux state change.
How I did it
Check warm restart flag before applying changes to app db.
How to verify it
Ran write_standby in table missing, key missing, field missing scenarios.
Did a warm restart, app db changes were skipped. Saw this in syslog:
WARNING write_standby: Taking no action due to ongoing warmrestart.
When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing.
I.e, the following script does not react on SIGTERM sent to it if it is
waiting for sleep to finish:
```
trap "echo Handled SIGTERM" 0 2 3 15
echo "Before sleep"
sleep inf
echo "After sleep"
```
Instead, trap only on EXIT which covers also a scenario with exit on
SIGINT, SIGTERM.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
What I did:
Added bgp as a dependent of swss
Why I did it:
bgp container was not restarting on swss crash. When swss crashes, linkmgrd
doesn't initate a switchover because it cannot access the default route from
orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd
to initiate a switchover to the peer ToR avoiding significant packet loss.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
- Why I did it
Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform.
Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again.
- How I did it
On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running.
If it is running it means we are at the first boot and continue normally.
If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent.
- How to verify it
Run fast/warm reboot.
service swss restart.
Observe PMON service starting.
#### Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.
#### How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.
#### How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is to backport a fix#10568
This PR is dependent on PR: #10745
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
To optimize stop on warm boot, added kill for containers
Use service "kill" in the shutdown path for fast and warm reboot. For all other reload methods, service "stop" is used.
This is done to save time in shutdown path, and to overall improve the time spent in warm and fast reload.
How - Use service_mgmt.sh to trigger common logic to initiate kill (fast/warm) or stop (cold) for database.sh, radv.sh, snmp.sh, telemetry.sh, mgmt-framework.sh
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>, Vaibhav H D <vaibhav.dixit@microsoft.com>
This can save 6 sec for teamd LAG restoration - the time between:
```
Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```
- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.
- How I did it
Kill teamd docker after graceful shutdown of teamd processes.
- How to verify it
Run warm reboot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Fixes#8980 partly.
The corresponding changes in sonic-sairedis is here :
Azure/sonic-sairedis#975
How I did it
Include changes from both repos and build an image for verification.
How to verify it
Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level.
Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>
Why I did it
Cherry pick changes in #9197 to 202012 branch
Add bgpcfgd support to advertise routes.
How I did it
Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly.
How to verify it
Added unit tests in bgpcfgd and verify on KVM about route advertisement.
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
- Remove unused imports
- Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
- add a new service "mark_dhcp_packet" to mux container
- apply packet marks on a per-interface basis in ebtables
- write packet marks to "DHCP_PACKET_MARK" table in state_db
[write_standby]: Ignore non-auto interfaces
* In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Fix#7968
Issue is detected on SONiC.20201231.11
In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 180 0.1 0.0 6296 1272 pts/0 S 17:03 0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
#### Why I did it
* `arp_update` fails to ping those neighbors over vlan sub interfaces.
#### How I did it
* modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned.
* modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
In the configuration of rsyslog, duplicate messages will be suppressed and reported in the format of message repeated n times.
Due to this behavior, if a critical process in a container exited unexpectedly, the alerting message will be written into syslog once
and not be written into syslog anymore until the second critical process exited. This PR aims to differentiate these alerting messages such that they will not be suppressed by rsyslogd and can appear in the syslog periodically.
How I did it
This PR adds a counter into the alerting message and shows how many minutes a critical process was not running.
How to verify it
I verified and test this implementation on a physical DUT.
- What I did
All SWSS dependent services should stop before SWSS service to avoid future possible issues.
For example 'teamd' service will stop before to allow the driver unload netdev gracefully.
This is to stop all LAG's before restarting syncd service when running 'config reload' command.
- How I did it
Change the order of dependent services of SWSS.
- How to verify it
Run 'config reload' command.
Previously the operation failed when a large number of PortChannel configured on the system.
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
* Add *MUX_CABLE_TABLE* to set of tables to clear on SWSS start, which
will clear HW_MUX_CABLE_TABLE and MUX_CABLE_TABLE
* Order swss to start before pmon to ensure that DBs are cleared before
xcvrd (running inside pmon) starts and re-populates the tables
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
**- Why I did it**
To support FW upgrade on init.
**- How I did it**
Change timeout value
**- How to verify it**
I manually changed ASIC and Gearbox FW followed by hard reset in order for FW upgrade to take place on init.
Signed-off-by: liora <liora@nvidia.com>
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.
Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.
- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:
The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.
If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.
- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.
Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.
- Which release branch to backport (provide reason below if selected)
201811
201911
[x ] 202006
HLD: Azure/SONiC#646
In modular chassis, add CHASSIS_STATE_DB on control card
Why I did it
Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/
How I did it
Adding another DB on an existing REDIS instance running on port 6380.
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338)
- [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361)
- [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973)
**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
- `dynamic` for the dynamic buffer calculation model
- `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
- ASIC_TABLE in platform/\<vendor\>/asic_table.j2
- PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
- PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
- DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
- Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
- fetch the vendor name when creates the docker and pass it as a docker environment variable
- `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
Update the buffer settings for dynamic buffer calculation
**- Why I did it**
Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs.
**- How I did it**
Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.
Added new flag value 'always_enabled' for the state and auto-restart field of feature table
init_cfg.json is updated to initialize state field of database/swss/syncd/teamd feature and auto-restart field of database feature
as always_enabled
Once the state/auto-restart value is initialized as "always_enabled" it is immutable and cannot be change via feature config commands. (config feature..) PR#Azure/sonic-utilities#1271
hostcfgd will not take any action if state field value is 'always_enabled'
Since we have always_enabled field for auto-restart updated supervisor-proc-exit-listener
not to have special check for database and always rely on value from Feature table.
**- Why I did it**
We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.
**- How I did it**
- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
- Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
Summary: Move teamd functions to a new service script
Motivation: To segregate teamd functions in one common place. fast-reboot script calls teamd functions that should ideally be replaced by a simple call to a service script.
Changes: New teamd service script and path modification from /usr/bin/teamd.sh to /usr/local/bin/teamd.sh
fast-reboot script (in sonic-utilities) modification (to use new teamd.sh to stop teamd) should follow soon after this change.
Verification: VS image tests.
Signed-off-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
Co-authored-by: heidi.ou@alibaba-inc.com <heidi.ou@alibaba-inc.com>
Co-authored-by: Ying Xie <ying.xie@microsoft.com>
- Convert to Python 3
- Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` instead
- Remove locally-define logging functions; use Logger class from sonic-py-common instead
**- Why I did it**
On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present.
**- How I did it**
Add the teamd as a dependent service for swss
Updated the docker-wait script to handle service and dependent services separately.
Handle the case of warm-restart for the dependent service
**- How to verify it**
Verified the following scenario's with the following testbed
VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs
1. Stop teamd docker alone
> swss, syncd dockers seen going away
> The LAG reference count error messages seen for a while till swss docker stops.
> Dockers back up.
2. Enable WR mode for teamd. Stop teamd docker alone
> swss, syncd dockers not removed.
> The LAG reference count error messages not seen
> Repeated stop teamd docker test - same result, no effect on swss/syncd.
3. Stop swss docker.
> swss, teamd, syncd goes off - dockers comes back correctly, interfaces up
4. Enable WR mode for swss . Stop swss docker
> swss goes off not affecting syncd/teamd dockers.
5. Config reload
> no reference counter error seen, dockers comes back correctly, with interfaces up
6. Warm reboot, observations below
> swss docker goes off first
> teamd + syncd goes off to the end of WR process.
> dockers comes back up fine.
> ping traffic between VM's was NOT HIT
7. Fast reboot, observations below
> teamd goes off first ( **confirmed swss don't exit here** )
> swss goes off next
> syncd goes away at the end of the FR process
> dockers comes back up fine.
> there is a traffic HIT as per fast-reboot
8. Verified in multi-asic platform, the tests above other than WR/FB scenarios
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
* buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature
* scripts and configuration needed to support a second syncd docker (physyncd)
* physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device
* support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY).
HLD is located at b817a12fd8/doc/gearbox/gearbox_mgr_design.md
**- Why I did it**
This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based
on multi-switch support in sonic-sairedis.
**- How I did it**
Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point):
https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged)
https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged)
https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems
https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib
**- How to verify it**
In a vslib build:
root@sonic:/home/admin# show gearbox interfaces status
PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin
-------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ -------
1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down
1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down
1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down
In addition, docker ps | grep phy should show a physyncd docker running.
Signed-off-by: syd.logan@broadcom.com