### Why I did it
An in-band syslog server will not receive any syslog if it is configured without a VRF specified, which is because `eth0` is always specified as the `device` of a syslog server and the syslog packets will be sent to `eth0` regardless of its destination IP address.
### How I did it
Pass the option "device" in rsyslog.conf only if when syslog server's source address is configured with a non-default VRF
#### How to verify it
Manually test:
1. Configuring a syslog server without VRF specified or with `default` as the VRF: no `device` passed in `rsyslog.conf`
2. Configuring a syslog server with non-default VRF: the configured VRF passed as `device` in `rsyslog.conf`
### Why I did it
Remove UpdateGraphService feature from sonic image. The goal is to simplify the bootup process.
### How I did it
Remove updategraph service and updategraph script.
Update all related services, replace updategraph.service with config-setup.service.
#### How to verify it
Build and install new image, load minigraph and check all the services.
Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.
How I did it
Setting loglevel attribute in crashkernel cmdline
How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Why I did it
Update smartmontool verson to 7.4. This is done to prevent smartmontools service to exit with non-zero exit status on platform that does not have a SSD/disk to be monitored.
Until Debian Bullseye (which had smartmontools 7.2), Debian had a patch applied that changed the default quit mode to never exit. A bug report was filed on Debian, saying that the source code patch isn't needed and could just be done via command line options, and also that smartmontools 7.3 has a new built-in option to exit with 0 if there are no monitorable devices found (which prevents systemd from treating it as a service failure). Because of that, Debian Bookworm (which also upgraded to 7.3) removed the patch and restored the default behavior of exiting with exit code 17 if there are no devices found.
Smartmontools v7.3 has this issue, because of which smartd exits with non-zero exit status even with "-q" option.
How I did it
Update the smartmontools to version 7.4 which has the fix for exiting gracefully if no monitoring device is found
Added smartd option "-q nodev0" to allow smartd to exit with status 0 if no monitoring device found
### Why I did it
Disable eventd at buildtime for slim images
##### Work item tracking
- Microsoft ADO **(number only)**:26386286
#### How I did it
Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image
#### How to verify it
Manual testing
### Why I did it
Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test.
### How I did it
Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address.
#### How to verify it
Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.
ix IPV6 forced-mgmt-route not work issue
Why I did it
IPV6 forced-mgmt-route not work
When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.
Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue.
Microsoft ADO (number only):24719238
* [image_config]: Update DHCP rate-limit for mgmt TOR devices
Change DHCP rate limit(queue4,group3) in SONiC copp configuration to 300 PPS
for mgmt TORs while keeping the rate limit at 100 PPS for other topologies.
Why I did it:
Some mgmt TORs based on Marvell ASIC do not support 100 PPS CIR, so that led
to these devices silently dropping DHCP packets.
Microsoft ADO: **25820076**
How to verify it:
Send DHCP broadcast packets to an M0 DUT and verify that they are trapped to
CPU at 300 PPS. On non-mgmt devices, the packets should be trapped at CIR of
100 PPS. Also ran sonic-mgmt dhcp_relay test and confirmed that it passes.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
- Why I did it
Optimize syslog rate limit feature for fast and warm boot
- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)
- How to verify it
Manual test
Regression test
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.
After investigation, I found the IPV6 'default' route table does not add to route lookup:
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
admin@vlab-01:~$
As compare:
admin@vlab-01:~$ ip -4 rule list
1001: from all lookup local
32764: from all to 172.17.0.1/24 lookup default
32765: from 10.250.0.101 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table exist in IPV4 route lookup
Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$
##### Work item tracking
- Microsoft ADO: 25798732
#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.
#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:
### Tested branch (Please provide the tested image version)
- [x] master-17281.417570-2133d58fa
#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
- Why I did it
Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely
- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf
- How to verify it
run regression on the MSN2410 platform to make the error log will not be printed to the syslog.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios
This is an extension to the change in image_config: copp: Enable rate limiting
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199
Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.
Microsoft ADO 25776614:
Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
Debian changed the defaults of the sudo package to never lecture the
user when using an unauthorized sudo command, which breaks our use case
of lecturing once. Add a line to lecture once, which is the old
defaults.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
pam-auth-update doesn't store local configuration, and it's meant to be
used by packages only. Because libpam-systemd was getting uninstalled
afterwards, this caused tacplus to get re-enabled.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Notable changes:
* Use j2cli from Debian repos instead of pip
* Use setuptools from Debian repos instead of pip
* Use wheel from Debian repos instead of pip
* Update grpcio and grpcio-tools python packages to match version in
Bookworm
* Use m2crypto from Debian repos instead of pip
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's
How I did:
- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.
How I verify:
Manual Verification
UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
Share docker image to support gnmi container and telemetry container
Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.
How to verify it
Run end to end test.
Why I did it
It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses.
This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types.
Work item tracking
Microsoft ADO 17964421:
How I did it
Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1.
The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps.
How to verify it
Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3.
import threading
from scapy.all import *
def send_dhcp_discover(intf):
dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \
/IP(src='1.1.1.1',dst='255.255.255.255') \
/UDP(sport=68,dport=67) \
/DHCP(options=[('message-type','discover'),('end')])
sendp(dhcp_discover,count=100000,iface=intf)
if __name__ == "__main__":
t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",))
t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",))
t1.start()
t2.start()
t1.join()
t2.join()
Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue.
Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
- Why I did it
Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely.
- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf
- How to verify it
run regression on the MSN2700 platform to make the error log will not be printed to the syslog.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
In #15080, there was a command added to re-add 127.0.0.1/8 to the lo
interface when the networking configuration is being brought down.
However, the trigger for that command is `down`, which, looking at
ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is
removed. This means there may be a period of time where there are no
loopback addresses assigned to the lo interface, and redis commands will
fail.
Fix this by changing this to pre-down, which should run well before
127.0.0.1/16 is removed, and should always leave lo with a loopback
address.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
### Why I did it
Currently there is only rsyslog plugin support for /var/log/syslog, meaning we do not detect events that occur in frr logs such as BGP Hold Timer Expiry that appears in frr/bgpd.log.
##### Work item tracking
- Microsoft ADO **(number only)**: 13366345
#### How I did it
Add omprog action to frr/bgpd.log and frr/zebra.log. Add appropriate regex for both events.
#### How to verify it
sonic-mgmt test case
### Why I did it
Need a tool to extend disk size
##### Work item tracking
- Microsoft ADO **(number only)**: 25094467
#### How I did it
Install parted package
#### How to verify it
Use apt list parted command to check if it's installed
### Why I did it
Need a tool to check certificate's detail of information.
##### Work item tracking
- Microsoft ADO **(number only)**: 25020260
#### How I did it
Install pyOpenSSL package for k8s master
#### How to verify it
Pip3 list to check whether it's installed when include_kubernetes_master=y
#### Why I did it
To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129).
There could be a scenario before the reboot, where
1. The `docker service` has stopped
2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry`
In such scenario, the `memory_checker` script will throw an error to the syslog:
```
ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'
```
But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog.
#### How I did it
Change the log severity to the warning and changed the return value.
#### How to verify it
It is really hard to catch the exact moment described in the `Why I did it` section.
In order to check the logic:
1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](47742dfc2c/files/image_config/monit/memory_checker (L139)) file on the switch.
2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry`
3. Check the syslog for such messages:
```
WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte
d.', FileNotFoundError(2, 'No such file or directory'))'
INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
```
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
Why I did it
Currently, k8s master image is generated from a separate branch which we created by ourselves, not release ones. We need to commit these k8s master related code to master branch for a better way to do k8s master image build out.
Work item tracking
Microsoft ADO (number only):
19998138
How I did it
Install k8s dashboard docker images
Install geneva mds and mdsd and fluentd docker images and tag them as latest, tagging latest will help create container always with the latest version
Install azure-storage-blob and azure-identity, this will help do etcd backup and restore.
Install kubernetes python client packages, this will help read worker and container state, we can send these metric to Geneva.
Remove mdm debian package, will replace it with the mdm docker image
Add k8s master entrance script, this script will be called by rc-local service when system startup. we have some master systemd services in compute-move repo, when VMM service create master VM, VMM will copy all master service files inside VM, the entrance script will setup all services according to the service files.
When the entrance script content changed, the PR build will set include_kubernetes_master=y to help do validation for k8s master related code change. The default value of include_kubernetes_master should be always n for public master branch. We will generate master image from internal master branch
How to verify it
Build with INCLUDE_KUBERNETES_MASTER = y
Why I did it
For route registry service, in order to block hijacked routes, IBGP session needs to be set up from BGP sentinel service to SONiC, and BGP sentinel service advertise the same route with higher local-preference and no export community. So that SONiC takes the route from BGP sentinel as the best path and does not advertise the route to EBGP peers.
In order to do that, new route-maps are needed. So this change adds a new set of templates, keeping BGPSentinel peers out of the other templates.
Work item tracking
Microsoft ADO (number only): 24451346
How I did it
Add sentinel_community in constants.yml, route from BGPSentinel do not match this community will be denied.
Add support to convert BGPSentinel related configuration in the BGPPeerPassive element of the minigraph to a new BGP_SENTINELS table in CONFIG_DB
Add a new set of "sentinels" templates to docker-fpm-frr
Add a new BGP peer manager to bgpcfgd, to add neighbors from the BGP_SENTINELS table using the "sentinels" templates
Add a test case for minigraph.py, making sure the BGPSentinel and BGPSentinelV6 elements create BGP_SENTINELS DB entry.
Add a set of test cases for the new sentinels templates in sonic-bgpcfgd tests.
Add sonic-bgp-sentinel.yang and a set of testcases for the yang file.
How to verify it
Testcases and UT newly added would pass.
Setup IPv4 and IPv6 BGPSentinel services in minigraph, and load minigraph, show CONFIG_DB and "show runningconfig bgp", configuration would be loaded successfully.
Using t1-lag topo and setup IBGP session from BGPSentinel to SONiC loopback address, IBGP session would up.
Advertise route from BGPSentinel to T1 with sentinel_community, higher local-preference and no-export communiyt. In T1, show bgp route, the result is "Not advertise to any EBGP peer".
Withdraw the route in BGPSentinel, in T1, route would advertise to EBGP peers.
Advertise route from T1 that does not match sentinel_community, in T1, would not see the route in show bgp route.
Why I did it
For some devices whose log folder size is larger than 200M, for example, 256M, the LOG_FILE_ROTATE_SIZE_KB should be 16M. and
THRESHOLD_KB=$((USABLE_SPACE_KB - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (VAR_LOG_SIZE_KB * 90 / 100) - RESERVED_SPACE_KB)) - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (256M * 90 / 100) - 4096)) - (8 * 16M * 2)))
the result would be a negative value
Work item tracking
Microsoft ADO (number only):
24524827
How I did it
Add a case for 400M, if the log folder size is between 200M and 400M, set the log file size to 2M
How to verify it
Do cmd "sudo logrotate -f /etc/logrotate.conf" on DUT which val/log folder size is 256M, and check the syslog.
#### Why I did it
Support reset factory in Sonic OS
[Reset Factory HLD](https://github.com/sonic-net/SONiC/pull/1231)
[Sonic-mgmt tests](https://github.com/sonic-net/sonic-mgmt/pull/7652)
#### How I did it
- Added new script "/usr/bin/reset-factory"
* It generates a new config_db.json files with factory configurations
* It clears system files and logs
* It removes all docker containers on system except database
* It clears non-default users and restores default users password
- Dump the default users info to a new file during build "/etc/sonic/default_users.json"
- Supported new type "Keep-basic" in "config-setup factory"
- Add new conf file for config-setup "/etc/config-setup/config-setup.conf
#### How to verify it
- Run reset-factory script with all types: < none | keep-all-config | only-config | keep-basic >
- Run config-setup factory with parameters < none | keep-basic >
#### Description for the changelog
Support reset factory in Sonic OS
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
* Add an ability to configure remote syslog servers
* Add an initial configuration for remote syslog
* Extend YANG module and add unit tests
#### Why I did it
Adding the following functionality to rsyslog feature:
- Configure remote syslog servers: protocol, filter, severity level
- Update global syslog configuration: severity level, message format
#### How I did it
added parameters to syslog server and global configuration.
#### How to verify it
create syslog server using CLI/adding to Redis-DB
verify server is added to file /etc/rsyslog.conf and server is functional.
#### Description for the changelog
extend rsyslog capabilities, added server and global configuration parameters.
#### Link to config_db schema for YANG module changes
https://github.com/iavraham/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-syslog.yang
- Why I did it
Add support for static DNS configuration. According to sonic-net/SONiC#1262 HLD.
- How I did it
Add a new resolv-config.service that is responsible for transferring configuration from Config DB into /etc/resolv.conf file that is consumed by various subsystems in Linux to resolve domain names into IP addresses.
- How to verify it
Run the image compilation. Each component related to the static DNS feature is covered with the unit tests.
Run sonic-mgmt tests. Static DNS feature will be covered with the system tests.
Install the image and run manual tests.
This reverts commit 02b17839c3.
Reverts #14933
The earlier commit caused a race condition that particularly broke cross branch warm upgrade.
Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.
If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.
ADO: 24274591
* Re-add 127.0.0.1/8 when bringing down the interfaces
With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.
To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.
Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:
Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:
database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:
Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
Why I did it
Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module
Work item tracking
Microsoft ADO (number only): 17826134
How I did it
When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards
TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this)
Changes also include no-stats option with TSC for quick retrieval of the current system isolation state
This PR also reverts changes in #11403
How to verify it
These changes have a dependency on sonic-net/sonic-utilities#2701 for testing
Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard
Verify that all routes are withdrawn from eBGP neighbors on all linecards
Run TSB from supervisor module and ensure transition to Normal mode on each linecard
Verify that all routes are re-advertised from eBGP neighbors on all linecards
Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards
Part of sonic-net/sonic-utilities#2760
Similar to #14295
- Why I did it
To clear teamd timer when fast-reboot is finalized to prevent any further affect.
- How I did it
Deleted teamd timer from config-db in fast-reboot finalizer.
config save call is moved to after clearing teamd-timer so it won't have any further affect as well.
- How to verify it
Verified manually that entry was deleted after fast-reboot was finailized.