[write_standby]: Ignore non-auto interfaces
* In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
[mux] Update Service Install With SONiC Target
Recent PR grouped all SONiC service into sonic.taget. The install section
of mux.service was not update and this causes delays when using config
reload as the service failed state is not being reset.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
[mux] Start Mux on Only Dual-ToR Platform
mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.
Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y
Edit: linkmgrd PR will follow.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Related work items: #2315, #3146150
This is to pick up BRCM SAI 4.3.5.1-6 fixes which contains the following fixes:
1. CS00012213351 SONIC-50679: [TH, TH2] Warm-reboot from 3.5 to 4.3 fails due to null objects discovered
2. CS00012182162: SONIC-49805 TD3 MMU config profile optimization changes
3. CS00012210826:SONIC-50205/760c60fc: Should read MMU_INTFI_MMU_PORT_TO_MMU_QUEUES_FC_BKP for TH3
Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
fib/test_fib.py
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
decap/test_decap.py
ipfwd/test_dip_sip.py
ipfwd/test_dir_bcast.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
```
copp-config service needs to be started after sonic.target so that it could
render the copp-config with the latest information.
It also needs to be restarted when config reload or load_minigraph is invoked.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
As a part of warmboot, redis database is dumped:
c97fe546e5/scripts/fast-reboot (L269)
However, this dump file is deleted, after it is loaded back into db post reboot.
The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful.
Instead of deleting the dump, rename and keep the dump.
Added logrotate file for wtmp and btmp to override default conf and set size cap as 100K as done in
PR: #865. For buster this is control by separate file wtmp and btmp.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
*Removed execute permissions from the systemd copp-config.service file.
Without this we will get a warning: "Configuration file /lib/systemd/system/copp-config.service is marked executable. Please remove executable permission bits. Proceeding anyway."
Why I did it
fstrim has dependency on pmon docker.
How I did it
start fstrim timer after sonic.target.
How to verify it
local test and PR test.
Signed-off-by: Ying Xie ying.xie@microsoft.com
The Lodoga platform also matched crow which was hardcoding the flash
size to 3700. This change enables autodetect on Clearlake which in turns
allows autodetect for Lodoga.
The threshold was bumped from 3700 to 4000 because size computation can
differ slightly and report slightly above 3700.
Lodoga actually has a 8GB storage device.
LodogaSsd variant has a 30GB SSD drive.
However, in boot0 both were mishandled and assigned 4GB for legacy reasons.
Remove the hardcoding of the flash size and let boot0 autodetect the available space.
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.
A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created
#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`
#### How to verify it
Manually test.
#### Why I did it
Create a target for delayed service timers. Few services in sonic have delayed to speed up the bring up of the system and essential services. However there is no way to track when they start. This will be a problem when executing config reload as config reload expects all services to be up. Hence grouped all the timers that trigger the delayed services under one target so that they could be tracked in 'config reload' command
#### How I did it
Created delay.target service and add created dependency on the delayed targets.
Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now.
Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Fix#7968
Issue is detected on SONiC.20201231.11
In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 180 0.1 0.0 6296 1272 pts/0 S 17:03 0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
#### Why I did it
* `arp_update` fails to ping those neighbors over vlan sub interfaces.
#### How I did it
* modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned.
* modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces.
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>