Why I did it
The current lazy installer relies on a filename sort for both unpack and configuration steps. When systemd services are configured [started] by multiple packages the order is by filename not by the declared package dependencies. This can cause the start order of services to differ between first-boot and subsequent boots. Declared systemd service dependencies further exacerbate the issue (e.g. blocking the first-boot script).
The current installer leaves packages un-configured if the package dependency order does not match the filename order.
This also fixes a trivial bug in [Build]: Support to use symbol links for lazy installation targets to reduce the image size #10923 where externally downloaded dependencies are duplicated across lazy package device directories.
How I did it
Changed the staging and first-boot scripts to use apt-get:
dpkg -i /host/image-$SONIC_VERSION/platform/$platform/*.deb
becomes
apt-get -y install /host/image-$SONIC_VERSION/platform/$platform/*.deb
when dependencies are detected during image staging.
How to verify it
Apt-get critical rules
Add a Depends= to the control information of a package. Grep the syslog for rc.local between images and observe the configuration order of packages change.
Why I did it
nameserver and domain entries from build system fsroot gets into sonic image.
How I did it
Clear /etc/resolv.conf before building image
How to verify it
Built image with it and verified with install that /etc/resolv.conf is empty
* Fix to improve hostname handling
If config_db.json is missing hostname entry, hostname-config.sh ends
up deleting existing entry too and hostname changes to default 'localhost'
* default hostname to 'sonic` if missing in config file
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
Pick up fix for CS00012263713 (mirrored packet with extra VLAN Tag) BRCM SAI 4.3.7.1-1
Preliminary tests look fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
fib/test_fib.py
acl/test_acl.py
arp/test_neighbor_mac_noptf.py
fdb/test_fdb.py
decap/test_decap.py
pc/test_lag_2.py
pc/test_po_cleanup.py
pc/test_po_update.py
everflow/test_everflow_ipv6.py
everflow/test_everflow_testbed.py
route/test_default_route.py
ipfwd/test_dip_sip.py
copp/test_copp.py
crm/test_crm.py
* [mux] skip mux operations during warm shutdown
- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Make client indentity by AME cert
* Join k8s cluster by ipv6
* Change join test cases
* Test case bug fix
* Improve read node label func
* Configure kubelet and change test cases
* For kubernetes version 1.22.2
* Fix undefine issue
Signed-off-by: Yun Li <yunli1@microsoft.com>
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Why I did it
The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay.
However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started.
So this script will have to write something at start up time.
For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if
link prober found link not in good state.
How I did it
Update the script to always provide initial values.
How to verify it
Tested on active-active dual-tor testbed.
Signed-off-by: Ying Xie ying.xie@microsoft.com
Avoid write_standby in warm restart context.
sign-off: Jing Zhang zhangjing@microsoft.com
Why I did it
In warm restart context, we should avoid mux state change.
How I did it
Check warm restart flag before applying changes to app db.
How to verify it
Ran write_standby in table missing, key missing, field missing scenarios.
Did a warm restart, app db changes were skipped. Saw this in syslog:
WARNING write_standby: Taking no action due to ongoing warmrestart.
When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing.
I.e, the following script does not react on SIGTERM sent to it if it is
waiting for sleep to finish:
```
trap "echo Handled SIGTERM" 0 2 3 15
echo "Before sleep"
sleep inf
echo "After sleep"
```
Instead, trap only on EXIT which covers also a scenario with exit on
SIGINT, SIGTERM.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Pick upfollowing fixes and update BRCM SAI to 4.3.7.0:
CS00012208537: Add back previous commit 54c5bc4848eb748
CS00012253061,SONIC-63280: WB from 3.5 to 4.3, followed by WB to 4.3
CS00012207978: SDK-296517, time spent for SAI operations
CS00012245601,SONIC-62898: Egress ACL Counted ad Interface TX drops
Update pcbb with Fixes for CS00012243699
Upgrade on pcbb with Fixes for KB0025353, CS00012221689, CS00012221688, KB0025391, CS00012230519
commit of "CS00012221688:PFC frames egressing, PFC storm happens simultaneously on 2 ports" is purposely skipped to be picked up later due to SWSS dependency not ready.
Why I did it
How I did it
How to verify it
Tested build target, successful
Manually run these tests after installing sai binary within image 20201231.73 on 7050CX3 (TD3) T0 DUT, all passed.
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
pfcwd/test_pfcwd_all_port_storm.py
acl/null_route/test_null_route_helper.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.
sign-off: Jing Zhang zhangjing@microsoft.com
Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.
How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.
How to verify it
Verified on lab device, linkmgrd reconciled as expected.
In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Why I did it
This PR is to backport #11569 into 202012 branch.
This PR is to apply different DSCP_TO_TC_MAP to downlink and uplink ports on T1 in dualtor deployment.
For T1 downlink ports (To T0)
The DSCP_TO_TC_MAP is not changed. DSCP2 and DSCP6 are mapped to TC2 and TC6 respectively.
For T1 uplink ports (To T1)
A new DSCP_TO_TC_MAP|AZURE_UPLINK is defined and applied. DSCP2 and DSCP6 are mapped to TC1 to avoid mixing up lossy and lossless traffic from T2.
The extra lossy PG2 and PG6 added in PR #11157 is reverted as well because no traffic from T2 is mapped to PG2 or PG6 now.
How I did it
Define a new map DSCP_TO_TC_MAP|AZURE_UPLINK for 7260 T1.
How to verify it
Verified by test case in test_j2files.py.
What I did:
Added bgp as a dependent of swss
Why I did it:
bgp container was not restarting on swss crash. When swss crashes, linkmgrd
doesn't initate a switchover because it cannot access the default route from
orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd
to initiate a switchover to the peer ToR avoiding significant packet loss.
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.
Signed-off-by: liora liora@nvidia.com
Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.
How I did it
Use systemctl is-active to check if Docker engine is still running.
How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.
Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
Why I did it
This PR is to cherry-pick #11448 to 202012 branch after resolving conflicts.
There are conflicts in
files/build_templates/qos_config.j2
src/sonic-config-engine/tests/test_j2files.py
Why I did it
Storage backend has all vlan members tagged. If untagged packets are received on those links, they are accounted as RX_DROPS which can lead to false alarms in monitoring tools. Using this acl to hide these drops.
How I did it
Created a acl template which will be loaded during minigraph load for backend. This template will allow tagged vlan packets and dropped untagged
How to verify it
Unit tests
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Support to use symbol links in platform folder to reduce the image size.
The current solution is to copy each lazy installation targets (xxx.deb files) to each of the folders in the platform folder. The size will keep growing when more and more packages added in the platform folder. For cisco-8000 as an example, the size will be up to 2G, while most of them are duplicate packages in the platform folder.
How I did it
Create a new folder in platform/common, all the deb packages are copied to the folder, any other folders where use the packages are the symbol links to the common folder.
Why platform.tar?
We have implemented a patch for it, see #10775, but the problem is the the onie use really old unzip version, cannot support the symbol links.
The current solution is similar to the PR 10775, but make the platform folder into a tar package, which can be supported by onie. During the installation, the package.tar will be extracted to the original folder and removed.
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
This PR aims to fix an issue (#10088) by enhancing the script memory_checker.
Specifically, if container is not created successfully during device is booted/rebooted, then memory_checker do not need check its memory usage.
How I did it
In the script memory_checker, a function is added to get names of running containers. If the specified container name is not in current running container list, then this script will exit without checking its memory usage.
How to verify it
I tested on a lab device by following the steps:
Stops telemetry container with command sudo systemctl stop telemetry.service
Removes telemetry container with command docker rm telemetry
Checks whether the script memory_checker ran by Monit will generate the syslog message saying it will exit without checking memory usage of telemetry.
Issue fixed: when performing a warm-reboot or fast-reboot from 201811 or 201911 to 202012 the kernel command line contains duplicate information. This issue is related to a change that was made to make 202012 boot0 file more futureproof.
A cold reboot brings everything back into a clean slate though not always desirable.
Changes done:
Added some logic to properly detect the end of the Aboot cmdline when cmdline-aboot-end delimiter is not set (clean case)
Added some logic to regenerate the Aboot cmdline when cmdline-aboot-end is set but duplicate parameters exists before (dirty case). Reorganized some code to handle duplicate parameter handling in the allowlist.
- Why I did it
Configure different DSCP_TO_TC_MAP between uplink and downlink on T1 switch in dual ToR scenario
On T1 uplink, both DSCP 2/6 will be mapped to TC 1 for the purpose of avoiding such traffic occupying lossless buffers.
On T1 downlink, they will be mapped to TC 2/6 respectively. (unchanged)
- How I did it
For vendors who want to configure different DSCP_TO_TC_MAP between uplinks and downlinks on T1, they should
Define generate_dscp_to_tc_map macro in SKU's qos.json.j2 file
Define map AZURE for downlink and AZURE_UPLINK for uplink
Define jinja2 variable different_dscp_to_tc_map as True
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario
1. Support additional queue and PG in buffer templates, including both traditional and dynamic model
2. Support mapping DSCP 2/6 to lossless traffic in the QoS template.
3. Add macros to generate additional lossless PG in the dynamic model
4. Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2
- Buffer tables are rendered via using macros.
- Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones.
5. Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues.
On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as:
40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues
16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues
Signed-off-by: Stephen Sun stephens@nvidia.com
How to verify it
Run regression test.
* Remove SSH host keys after installing the custom version of sshd
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Use an override for for sshd instead of overwriting the service file
Don't overwrite upstream's .service file, and instead use an override
file for making sure the host key(s) are generated.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform.
Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again.
- How I did it
On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running.
If it is running it means we are at the first boot and continue normally.
If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent.
- How to verify it
Run fast/warm reboot.
service swss restart.
Observe PMON service starting.
Support Tunnel PFC/pcbb feature on Broadcom platform.
How to verify it
Tested build target, successful
make target/docker-syncd-brcm.gz
manual run those tests after installing sai binary within image 20201231.67 on 7050CX3 (TD3) T0 DUT, all passed
fib/test_fib.py
vxlan/test_vxlan_decap.py
fdb/test_fdb.py
decap/test_decap.py
pfcwd/test_pfcwd_all_port_storm.py
acl/null_route/test_null_route_helper.py
acl/test_acl.py
vlan/test_vlan.py
platform_tests/test_reboot.py
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Use buster-backports version
* Use dget dsc file instead source repo
* Update make files
* Upgrade openssh-client to 8.4 in base image
* Remove useless installation
* Install openssh-server from buster-backports in build_debian
* Update dev buster package version list
Signed-off-by: Jing Kan jika@microsoft.com
This reverts commit 15cf9b0d70.
Why I did it
Revert the PR #10775, for it has impact on onie installation.
It is caused by the symbol links not supported in some of the onie unzip.
We will enable after fixing the issue, see #10914
#### Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.
#### How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.
#### How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is to backport a fix#10568
This PR is dependent on PR: #10745
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
Backport https://github.com/Azure/sonic-buildimage/pull/10573 to 202012.
#### Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
1. Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
2. Systemd starts interface-config service who depends on networking service
3. Interface-config service runs command “ifdown –force eth0”, check [line](16717d2dc5/files/image_config/interfaces/interfaces-config.sh (L4)). but networking service is still running so that this [line](ac32bec0e2/ifupdown2/ifupdown/main.py (L74)) failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
4. Interface-config service updates /etc/networking/interface to static configuration.
5. Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
6. When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
#### How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
#### How to verify it
Manual test.
Why I did it
The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies.
For cisco image, the image size will reduce from 3.5G to 1.7G.
How I did it
Use symbol links to only keep one package for each of the lazy package.
Make a new folder fsroot/platform/common
Copy the lazy packages into the folder.
When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.
Why I did it
In parallel of this change Arista added a custom logrotate configuration as part of its driver library.
Having 2 logrotate configuration for the same log file triggers an issue.
Fixesaristanetworks/sonic#38
How I did it
Arista merged a few changes in sonic-buildimage which added a logrotate configuration aristanetworks/sonic@e43c797
It is therefore the right path to remove the arista.log line from the logrotate.d/rsyslog configuration.
How to verify it
Logrotate works without any error message, arista log rotation happens and arista daemons still append logs once file was truncated.
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Why I did it
This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container.
Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following:
check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"
If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted.
Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window.
The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok.
Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry:
Program 'container_memory_telemetry'
status Status ok
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 0
last output -
data collected Sat, 19 Mar 2022 19:56:26
Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times
within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service:
Program 'container_memory_telemetry'
status Status failed
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 0
last output -
data collected Tue, 01 Feb 2022 22:52:55
After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok.
How I did it
In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles.
How to verify it
I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.
The interface renaming logic fails if one interface is missing.
Because of the `set -e` the whole initramfs hook would abort early on
error.
This change fixes the current behavior to make sure missing interfaces
are properly skipped and ensure existing interface are renamed.
To optimize stop on warm boot, added kill for containers
Use service "kill" in the shutdown path for fast and warm reboot. For all other reload methods, service "stop" is used.
This is done to save time in shutdown path, and to overall improve the time spent in warm and fast reload.
How - Use service_mgmt.sh to trigger common logic to initiate kill (fast/warm) or stop (cold) for database.sh, radv.sh, snmp.sh, telemetry.sh, mgmt-framework.sh
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>, Vaibhav H D <vaibhav.dixit@microsoft.com>
If it is run during image install, it's not guaranteed that the
installation environment will have tune2fs available. Therefore, run it
during initramfs instead.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>