Commit Graph

1184 Commits

Author SHA1 Message Date
vganesan-nokia
9ffa92cc61
[swss] Chassis db clean up optimization and bug fixes (#16454) (#16540)
* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
2023-09-14 14:06:52 -07:00
mssonicbld
26e1d59867
[chassis] Chassis DB cleanup when asic comes up (#16213) (#16379) 2023-09-02 07:18:17 +08:00
mssonicbld
fea5bd34bc
chassis-packet: Update arp_update script for FAILED and STALE check (#16311) (#16384) 2023-09-02 07:11:54 +08:00
Liping Xu
ddb4ce1040 update rsyslog log size conf (#15821)
Why I did it
For some devices whose log folder size is larger than 200M, for example, 256M, the LOG_FILE_ROTATE_SIZE_KB should be 16M. and
THRESHOLD_KB=$((USABLE_SPACE_KB - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (VAR_LOG_SIZE_KB * 90 / 100) - RESERVED_SPACE_KB)) - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2)))
= $(( (256M * 90 / 100) - 4096)) - (8 * 16M * 2)))
the result would be a negative value

Work item tracking
Microsoft ADO (number only):
24524827
How I did it
Add a case for 400M, if the log folder size is between 200M and 400M, set the log file size to 2M

How to verify it
Do cmd "sudo logrotate -f /etc/logrotate.conf" on DUT which val/log folder size is 256M, and check the syslog.
2023-09-01 14:33:30 +08:00
Vivek
181eb02865 Run db_migrator for non first-time reboots (#16116)
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-08-31 14:33:00 +08:00
Vaibhav Hemant Dixit
e7ce179b73 Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
2023-08-25 02:32:24 +08:00
Kebo Liu
4f403c9079
[202211] [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2211.25.1.0 (#16096) (#16095)
This is to backport #16096

Why I did it
SONiC changes:

Support Spectrum4 ASIC FW binary building.
Support new SDK sx-obj-desc lib building since new SAI need it.
Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2211.25.1.0
SDK/FW bug fixes

In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features

On SN2700 all ports can support y cable by credo
SAI bug Fixes

When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features

Port init profile
Dual ToR Active-Standby | Additional MAC support
Work item tracking
Microsoft ADO (number only):
How I did it
Update SDK/FW/SAI make files

How to verify it
Run full sonic-mgmt regression on Mellanox platform
2023-08-20 19:24:32 +08:00
mssonicbld
fb8f6265c0
Fix issue: set delayed attribute to true for platform monitor service (#15816) (#16042) 2023-08-15 23:26:26 +08:00
Zain Budhwani
c3b25660ed
Fix 202211 test_events to unblock PR checker (#16151)
* [yang] Change swss-event, dhcp-relay-event leafref to string (#13326)

Why I did it
Do not require leafref as part of yang. Only need string to compare whether string received from event matches what is possible for ifname.

How I did it
How to verify it
Run UT

* Add fix to monit_regex.json for catching mem_usage and cpu_usage (#14954)

Why I did it
Current regex not able to capture logs, modify regex to capture syslog messages

Work item tracking
Microsoft ADO (number only): 13366345
How I did it
Code change

How to verify it
sonic-mgmt test case

* Update usage leaf in sonic-events-host yang models (#15805)

#### Why I did it

event yang models for usage currently use int as type for usage leaf, needs to be of type decimal64

##### Work item tracking
- Microsoft ADO **(number only)**:17747466

#### How I did it

Update yang models and UT

#### How to verify it

UT
2023-08-15 14:16:06 +08:00
Vadym Hlushko
73999dddca
[syncd.sh] Clear semaphore before updating firmware (#15819)
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2023-08-06 22:31:05 -07:00
lixiaoyuner
87a8de4816
Move k8s script to docker-config-engine (#14788) (#15767)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-10 08:55:45 -07:00
mssonicbld
77aeb9bf73
Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684) (#15745) 2023-07-08 06:54:44 +08:00
mssonicbld
37224030dc
[arp_update]: Fix IPv6 neighbor race condition (#15583) (#15693) 2023-07-01 09:13:06 +08:00
mssonicbld
3d99bf71b6
[mlnx-ffb.sh] Update issu-version location (#14925) (#15674) 2023-06-30 08:10:20 +08:00
Junchao-Mellanox
143814c00f Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253)
#### Why I did it

A workaround to back port the fix for a systemd issue.

The systemd issue: https://github.com/systemd/systemd/issues/24668
The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files

The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test. 

#### How I did it
Copy the correct systemd-udevd.service file in build time 

#### Tested branch (Please provide the tested image version)

- [x] 202211
- [ ] <!-- image version 2 -->

```
SONiC Software Version: SONiC.fix-udev.3-b65c7bdec_Internal
SONiC OS Version: 11
Distribution: Debian 11.7
Kernel: 5.10.0-18-2-amd64
Build commit: b65c7bdec
Build date: Mon Jun 19 10:54:50 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci02-241

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2022X08597
Model Number: MSN4700-WS2FO
Hardware Revision: A1
Uptime: 08:10:11 up 1 min,  1 user,  load average: 1.81, 0.67, 0.24
Date: Sun 25 Jun 2023 08:10:11

Docker images:
REPOSITORY                    TAG                             IMAGE ID       SIZE
docker-fpm-frr                fix-udev.3-b65c7bdec_Internal   a7b911e7cb6f   346MB
docker-fpm-frr                latest                          a7b911e7cb6f   346MB
docker-platform-monitor       fix-udev.3-b65c7bdec_Internal   94c5178cf80b   731MB
docker-platform-monitor       latest                          94c5178cf80b   731MB
docker-orchagent              fix-udev.3-b65c7bdec_Internal   46b393e0ace8   328MB
docker-orchagent              latest                          46b393e0ace8   328MB
docker-syncd-mlnx             fix-udev.3-b65c7bdec_Internal   1f5c6c23e33a   734MB
docker-syncd-mlnx             latest                          1f5c6c23e33a   734MB
docker-sflow                  fix-udev.3-b65c7bdec_Internal   7e45992c8c59   317MB
docker-sflow                  latest                          7e45992c8c59   317MB
docker-teamd                  fix-udev.3-b65c7bdec_Internal   e4d905592cda   316MB
docker-teamd                  latest                          e4d905592cda   316MB
docker-nat                    fix-udev.3-b65c7bdec_Internal   7fe799367580   319MB
docker-nat                    latest                          7fe799367580   319MB
docker-macsec                 latest                          d702a5554171   318MB
docker-snmp                   fix-udev.3-b65c7bdec_Internal   3bce8fcf71cd   338MB
docker-snmp                   latest                          3bce8fcf71cd   338MB
docker-sonic-telemetry        fix-udev.3-b65c7bdec_Internal   f13949cbc817   597MB
docker-sonic-telemetry        latest                          f13949cbc817   597MB
docker-dhcp-relay             latest                          153d9072805d   306MB
docker-router-advertiser      fix-udev.3-b65c7bdec_Internal   aed642b9a6bc   299MB
docker-router-advertiser      latest                          aed642b9a6bc   299MB
docker-sonic-p4rt             fix-udev.3-b65c7bdec_Internal   a3cae5ca65a7   870MB
docker-sonic-p4rt             latest                          a3cae5ca65a7   870MB
docker-mux                    fix-udev.3-b65c7bdec_Internal   b81f0401b9a8   347MB
docker-mux                    latest                          b81f0401b9a8   347MB
docker-eventd                 fix-udev.3-b65c7bdec_Internal   c5917d0e801f   298MB
docker-eventd                 latest                          c5917d0e801f   298MB
docker-lldp                   fix-udev.3-b65c7bdec_Internal   fd5dc14a7976   341MB
docker-lldp                   latest                          fd5dc14a7976   341MB
docker-database               fix-udev.3-b65c7bdec_Internal   438c2715a1dd   299MB
docker-database               latest                          438c2715a1dd   299MB
docker-sonic-mgmt-framework   fix-udev.3-b65c7bdec_Internal   5c50b115fbcd   414MB
docker-sonic-mgmt-framework   latest  
```
2023-06-30 02:37:38 +08:00
Vaibhav Hemant Dixit
b62231566b Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)
This reverts commit 02b17839c3.

Reverts #14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
2023-06-17 14:32:23 +08:00
Saikrishna Arcot
8195e33120 Re-add 127.0.0.1/8 when bringing down the interfaces (#15080)
* Re-add 127.0.0.1/8 when bringing down the interfaces

With #5353, 127.0.0.1/16 was added to the lo interface, and then
127.0.0.1/8 was removed. However, when bringing down the lo interface,
like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8
isn't added back to the interface. This means that there's a period of
time where 127.0.0.1 is not available at all, and services that need to
connect to 127.0.01 (such as for redis DB) will fail.

To fix this, when going down, add 127.0.0.1/8. Add this address before
the existing configuration gets removed, so that 127.0.0.1 is available
at all times.

Note that running `ifdown lo` doesn't actually bring down the loopback
interface; the interface always stays "physically" up.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-06-16 14:30:34 +08:00
siqbal1986
baa5175819 Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE,VNET_ROUTE_TUNNEL_TABLE to the list (#14992)
* The 3 tables in state DB need to be cleaned up after SWSS restart for have consistant state.
2023-06-16 09:54:58 +08:00
Liping Xu
deb94af61b allow docker_inram to kernel cmd list (#15374)
Why I did it
After docker_inram is enabled, the docker folder's default max size is 1.5G.
It's not big enough for some tests which need to install additional docker images or install extra packages.

Work item tracking
Microsoft ADO 24199761:
How I did it
add docker_inram into cmdline_allowlist

How to verify it
sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append'
sudo reboot and check the docker folder size
2023-06-15 14:33:58 +08:00
Sudharsan Dhamal Gopalarathnam
78977ddbce
[202211][config reload]Config Reload Enhancement (#15334)
Backporting #13969

Why I did it
Implementing code changes for sonic-net/SONiC#1203

Work item tracking
Microsoft ADO (number only):
How I did it
Removed the timers and delayed target since the delayed services would start based on event driven approach.
Cleared port table during config reload and cold reboot scenario.
Modified yang model, init_cfg.json to change has_timer to delayed

How to verify it
Added UT to verify
2023-06-12 13:22:16 +08:00
mssonicbld
5f4b54a9cd
[ci/build]: Upgrade SONiC package versions (#15361) 2023-06-06 19:46:12 +08:00
mssonicbld
e4d8355976
[ci/build]: Upgrade SONiC package versions (#15329) 2023-06-04 18:12:12 +08:00
mssonicbld
4e9569ee3b
[ci/build]: Upgrade SONiC package versions (#15165) 2023-06-03 17:22:05 +08:00
mssonicbld
084564bdde
Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933) (#15317) 2023-06-03 09:16:42 +08:00
Anish Narsian
71ecd727ac [arp_update] Resolve neighbors from config_db (#15006)
* To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.
2023-05-18 09:46:56 +08:00
Nazarii Hnydyn
ba54e1e1ae
Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341)" (#15094)
This reverts commit 499f57a7f7.
2023-05-17 15:59:55 +08:00
DavidZagury
4fd2a6297f
[Secure Boot] Add Secure Boot Support (#12692) (#14963)
- Why I did it
Add Secure Boot support to SONiC OS.
Secure Boot (SB) is a verification mechanism for ensuring that code launched by a computer's UEFI firmware is trusted. It is designed to protect a system against malicious code being loaded and executed early in the boot process before the operating system has been loaded.

- How I did it
Added a signing process to sign the following components:
shim, grub, Linux kernel, and kernel modules when doing the build, and when feature is enabled in build time according to the HLD explanations (the feature is disabled by default).

- How to verify it
There are self-verifications of each boot component when building the image, in addition, there is an existing end-to-end test in sonic-mgmt repo that checks that the boot succeeds when loading a secure system (details below).

How to build a sonic image with secure boot feature: (more description in HLD)

Required to use the following build flags from rules/config:
SECURE_UPGRADE_MODE="dev"
SECURE_UPGRADE_DEV_SIGNING_KEY="/path/to/private/key.pem"
SECURE_UPGRADE_DEV_SIGNING_CERT="/path/to/cert/key.pem"
After setting those flags should build the sonic-buildimage.
Before installing the image, should prepared the setup (switch device) with the follow:
check that the device support UEFI
stored pub keys in UEFI DB

enabled Secure Boot flag in UEFI
How to run a test that verify the Secure Boot flow:
The existing test "test_upgrade_path" under "sonic-mgmt/tests/upgrade_path/test_upgrade_path", is enough to validate proper boot
You need to specify the following arguments:
Base_image_list your_secure_image
Taget_image_list your_second_secure_image
Upgrade_type cold
And run the test, basically the test will install the base image given in the parameter and then upgrade to target image by doing cold reboot and validates all the services are up and working correctly

Co-authored-by: davidpil2002 <91657985+davidpil2002@users.noreply.github.com>
2023-05-15 10:13:26 +08:00
Ying Xie
f6216435c8
Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516)" (#14901)
This reverts commit 5ef488f808.
2023-05-01 16:49:08 -07:00
mssonicbld
776937c48f
[ci/build]: Upgrade SONiC package versions (#14894) 2023-04-30 18:37:57 +08:00
mssonicbld
4479245996
[ci/build]: Upgrade SONiC package versions (#14889) 2023-04-29 18:57:14 +08:00
mssonicbld
99d6003717
Changes to support TSA from supervisor (#14691) (#14878) 2023-04-28 21:11:55 +08:00
mssonicbld
5ac1051f8f
Temporary WA for the issue that asic_table.json can not be rendered (#13888) (#14857) 2023-04-27 02:57:10 +08:00
mssonicbld
e0ef5b9808
[write standby] force DB connections to use unix socket to connect (#14524) (#14773) 2023-04-24 01:49:42 +08:00
mssonicbld
f53c7b66cd
[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583) (#14774) 2023-04-23 21:07:26 +08:00
mssonicbld
70cfef252f
Delay mux/sflow/snmp timer after interface-config service (#14506) (#14771) 2023-04-23 20:52:06 +08:00
mssonicbld
72776df8ba [ci/build]: Upgrade SONiC package versions 2023-04-23 20:46:40 +08:00
Stephen Sun
1d3fa0b03c Enhance the error message output mechanism (#14384)
#### Why I did it

Enhance the error message output mechanism during swss docker creating

#### How I did it

Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog.

#### How to verify it

Manually test
2023-04-23 18:32:40 +08:00
mssonicbld
e9daace147
[ci/build]: Upgrade SONiC package versions (#14800) 2023-04-22 18:35:55 +08:00
mssonicbld
8e1bbab07d
[image_config] add rasdaemon.timer (#14300) (#14762) 2023-04-22 00:18:05 +08:00
Hua Liu
bee30fdfb9 Improve sudo cat command for RO user. (#14428)
Improve sudo cat command for RO user.

#### Why I did it
RO user can use sudo command show none syslog files.

#### How I did it
Improve sudo cat command for RO user.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly.

#### Description for the changelog
Improve sudo cat command for RO user.
2023-04-21 06:32:24 +08:00
mssonicbld
aea1980b14
[ci/build]: Upgrade SONiC package versions (#14720) 2023-04-19 19:30:56 +08:00
mssonicbld
cc22d69fd3
[ci/build]: Upgrade SONiC package versions (#14680) 2023-04-16 18:59:28 +08:00
mssonicbld
b4dafae65d
[ci/build]: Upgrade SONiC package versions (#14673) 2023-04-15 20:37:33 +08:00
xumia
5dbf512cda
Support to add SONiC OS Version in device info (#14601) (#14623)
Why I did it
Cherry-pick #14601, for code conflict.
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.

SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Work item tracking
Microsoft ADO (number only): 17894593
How I did it
How to verify it
2023-04-13 19:28:03 +08:00
mssonicbld
46af37f77d
[ci/build]: Upgrade SONiC package versions (#14629) 2023-04-12 19:19:12 +08:00
anamehra
e107549942 chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-12 18:32:47 +08:00
mssonicbld
73766c2fa1
Finalize fast-reboot in warmboot finalizer (#14238) (#14608) 2023-04-11 22:54:56 +08:00
mssonicbld
4d0f1c1972
[ci/build]: Upgrade SONiC package versions (#14578) 2023-04-09 19:17:25 +08:00
mssonicbld
05a9ce9628
[ci/build]: Upgrade SONiC package versions (#14572) 2023-04-08 19:08:35 +08:00
mssonicbld
a3951c2041
Increase wait_for_tunnel() timeout to 90s (#14279) (#14563) 2023-04-07 16:02:01 +08:00