Commit Graph

1104 Commits

Author SHA1 Message Date
Vivek
0dad269234 Run db_migrator for non first-time reboots (#16116)
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2023-09-07 14:33:12 +08:00
Vaibhav Hemant Dixit
908933bd8c
[202012] Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#16225)
Cherry pick of #15685

MSFT ADO: 24274591

#### Why I did it

Two changes:
### 1  Fix a day1 issue, where check to wait until `CONFIG_DB_INITIALIZED` is incorrect.
There are multiple places where same incorrect logic is used.

Current logic (`until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];`) will always result in pass, irrespective of the result of GET operation.
```
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 
```

Fix this logic by checking for value of flag to be "1".
```
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
```

This gap in logic was highlighted when another fix was merged: https://github.com/sonic-net/sonic-buildimage/pull/14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

### 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case

Currently, during warm shutdown `CONFIG_DB_INITIALIZED`'s value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of `CONFIG_DB_INITIALIZED` does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting `CONFIG_DB_INITIALIZED` to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
2023-08-24 11:48:56 -07:00
mssonicbld
5d5727f6b9
Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684) (#16223) 2023-08-22 05:14:25 +08:00
Hua Liu
4df2bc9b44
[202012] [TACACS+] Add audisp-tacplus for per-command accounting. (#8750) (#15788)
This pull request integrate audisp-tacplus to SONiC for per-command accounting.

##### Work item tracking
- Microsoft ADO **(number only)**: 24433713

#### Why I did it
To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic.

#### How I did it
1. Add auditd service to SONiC
2. Port and patch audisp-tacplus to SONiC

#### How to verify it
UT with CUnit to cover all new code in usersecret-filter.c
Also pass all current UT.

#### Tested branch (Please provide the tested image version)
Extract tacacs support functions into library, this will share TACACS config file parse code with other project.
Also fix memory leak issue in parse config code.

- [ ]  SONiC.202012-15723.312602-e230e2d3e

#### Description for the changelog
Add audisp-tacplus for per-command accounting.
2023-07-12 18:46:47 -07:00
Stepan Blyshchak
bc58e2d841
[202012][mlnx-ffb.sh] Update issu-version location (#14927)
BACKPORT OF https://github.com/sonic-net/sonic-buildimage/pull/14925

#### Why I did it

ISSU version check fails due to inability to mount squashfs from 202211 on 201911

#### How I did it

Put ISSU version file under platform directory

#### How to verify it

202012 (with [202012][mlnx-ffb.sh] Update issu-version location  #14927) to master
2023-07-01 23:43:51 -07:00
Vaibhav Hemant Dixit
406852cf30 Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)
This reverts commit 02b17839c3.

Reverts #14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
2023-06-29 12:37:24 +08:00
siqbal1986
fe449db131
Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE to the list of tables to be cleaned up after swss restart. (#15398)
* [cleanup] tables VNET_MONITOR, BFD_SESSION, ROUTE_TUNNEL
2023-06-15 16:16:20 -07:00
Vaibhav Hemant Dixit
15021cf12a
[202012][BGP] starting BGP service after swss (#15365)
Cherrypick #12381 into 202012

Reverts #15312

Work item tracking
Microsoft ADO (number only): 24163872
2023-06-07 19:19:37 -07:00
Vaibhav Hemant Dixit
2a8d6912ea
Start BGP after interfaces-config.service (#15312)
Why I did it
Cherry-pick of #11827

This is to fix issue: [201811->202012] During warm recovery, TOR did not announce Loopback, VLAN route after upgrade

Suspected cause: 202012 does not have system dependency for bgp service to start after interfaces-config.service.

This opens a window for race condition: bgp service completing before interfaces are initialized.
BGP will miss announcing some routes if the interfaces are not ready.
2023-06-05 14:15:03 -07:00
Vaibhav Hemant Dixit
6e705dddb0 Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
2023-06-02 18:25:16 +00:00
Anish Narsian
d14a094f0a
Resolve neighbors from config_db (#14990)
* To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.
2023-05-17 16:13:54 -07:00
xumia
a6644b2b99
[Build] Upgrade the python docker version (#15031)
#### Why I did it
[Build] Upgrade the python docker version to fix bgp not up issue

##### Work item tracking
- Microsoft ADO **(number only)**: 22236397
2023-05-12 11:37:00 -07:00
mssonicbld
e34c17813f [ci/build]: Upgrade SONiC package versions 2023-05-10 20:50:40 +08:00
mssonicbld
01a9c13af0
[ci/build]: Upgrade SONiC package versions (#14975) 2023-05-07 19:54:08 +08:00
mssonicbld
894a919733
[ci/build]: Upgrade SONiC package versions (#14973) 2023-05-06 21:40:58 +08:00
mssonicbld
99a8ad7d0d
[ci/build]: Upgrade SONiC package versions (#14893) 2023-04-30 18:46:57 +08:00
mssonicbld
4a74a02be9 [ci/build]: Upgrade SONiC package versions 2023-04-29 18:32:32 +08:00
Samuel Angebault
8c740555ae [Arista] Disable SSD NCQ on Lodoga (#13964)
Why I did it
Fix similar issue seen on #13739 but only for DCS-7050CX3-32S

How I did it
Add a kernel parameter to tell libata to disable NCQ

How to verify it
The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg.

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ

   READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec
  WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec
without NCQ

   READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec
  WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec
2023-04-27 12:33:38 +08:00
xumia
ae0a47dc6e
[Build][202012] Support Debian snapshot mirror to improve build stability (#14558)
#### Why I did it
Cherry-pick commits from master to support the snapshot based mirror, and fix the code conflicts. And add the last commit to fix the build broken issue according to the mirror change.

ad162ae0e [Build] Optimize the version control for Debian packages (https://github.com/sonic-net/sonic-buildimage/pull/14557)
38c5d7fce [Build] Support j2 template for debian sources for docker ptf (https://github.com/sonic-net/sonic-buildimage/pull/13198)
5e4826ebf  [Ci] Support to use the same snapshot for all platform builds (#13913)
820692563 [Build] Change the default mirror version config file (#13786)
5e4a866e3 [Build] Support Debian snapshot mirror to improve build stability (#13097)
ac5d89c6a  [Build] Support j2 template for debian sources (#12557)
2023-04-20 22:45:33 -07:00
mssonicbld
a595a02d68
[ci/build]: Upgrade SONiC package versions (#14719) 2023-04-19 22:35:17 +08:00
mssonicbld
19b212c6a0
[ci/build]: Upgrade SONiC package versions (#14679) 2023-04-16 21:12:13 +08:00
mssonicbld
fcf2ae78de
[ci/build]: Upgrade SONiC package versions (#14671) 2023-04-15 20:34:05 +08:00
mssonicbld
65a2a970d8
[ci/build]: Upgrade SONiC package versions (#14622) 2023-04-12 21:39:43 +08:00
mssonicbld
6bef84bf39
[ci/build]: Upgrade SONiC package versions (#14607) 2023-04-12 00:38:39 +08:00
Dev Ojha
8a4f42d883
[202012][Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14539)
#### Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

Original PR for master: #14280 

#### How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router. 

#### How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect. 

#### Description for the changelog
[Buffer] Added cable length config to buffer config template for EdgeZoneAggregator
2023-04-10 11:58:34 -07:00
mssonicbld
916633cf1d
[ci/build]: Upgrade SONiC package versions (#14570) 2023-04-08 20:20:43 +08:00
mssonicbld
bb2cec56f0 [ci/build]: Upgrade SONiC package versions 2023-04-07 09:40:28 +08:00
mssonicbld
df34b8ea50
[ci/build]: Upgrade SONiC package versions (#14527) 2023-04-05 21:02:20 +08:00
Hua Liu
4033d6c929 Improve sudo cat command for RO user. (#14428)
Improve sudo cat command for RO user.

#### Why I did it
RO user can use sudo command show none syslog files.

#### How I did it
Improve sudo cat command for RO user.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly.

#### Description for the changelog
Improve sudo cat command for RO user.
2023-03-30 00:10:07 +00:00
Neetha John
6c7e24381e [storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-20 20:25:21 +00:00
mssonicbld
fd33a01796 [ci/build]: Upgrade SONiC package versions 2023-03-19 20:51:09 +08:00
mssonicbld
36cc9ae5d6
[ci/build]: Upgrade SONiC package versions (#14310) 2023-03-18 19:01:08 +08:00
mssonicbld
b791970c1c
[ci/build]: Upgrade SONiC package versions (#14306) 2023-03-18 09:39:48 +08:00
xumia
18d049082e
[ci/build]: Upgrade SONiC package versions (#14205)
Why I did it
[ci/build]: Upgrade SONiC package versions

How I did it
How to verify it
2023-03-14 08:00:29 +08:00
mssonicbld
06be00525a
[ci/build]: Upgrade SONiC package versions (#14080) 2023-03-05 04:31:07 +08:00
Saikrishna Arcot
26b0e7f709 Use tmpfs for /var/log on Arista 7050CX3-32S (#13805)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-28 18:23:40 +00:00
mssonicbld
cc17c7ac11
[ci/build]: Upgrade SONiC package versions (#13992) 2023-02-26 22:57:45 +08:00
mssonicbld
7455c56024
[ci/build]: Upgrade SONiC package versions (#13985) 2023-02-25 14:57:37 +08:00
Stepan Blyshchak
73c7ced753
[202012][Mellanox] Place FW binaries under platform directory instead of squashfs (#13890)
Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to 202012
202012 to 201911 downgrade
202012 -> 202012 reboot
ONIE -> 202012 boot (First FW burn)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 17:38:54 +02:00
mssonicbld
6230ced2b1
[ci/build]: Upgrade SONiC package versions (#13897) 2023-02-21 22:49:29 +08:00
Samuel Angebault
e01e1860d4 [Arista] Disable ATA NCQ for a few products (#13739)
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
2023-02-16 17:54:16 +00:00
Saikrishna Arcot
30fbc609c8 Use tmpfs for /var/log for Arista 7260 (#13587)
This is to reduce writes to disk, which then can use the SSD to get worn
out faster.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-02 22:53:50 +00:00
Nazarii Hnydyn
83b6518ae2
[202012][mellanox]: Add BIOS upgrade infra (#13571)
- Why I did it
Added BIOS upgrade infra

- How I did it
Added new make target

- How to verify it
Copy msn3800_bios.tar.gz to platform/mellanox/bios
make configure PLATFORM=mellanox
make target/files/buster/msn3800_bios.tar.gz

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-02-02 10:07:03 +02:00
xumia
d3a83cf8c7 [Bug] Fix SONiC installation failure caused by pip/pip3 not found (#13284)
The main issue is the pip/pip3 command cannot be found when the package is being installed by apt-get.
When using the dpkg install, the searching path is PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
When using the apt-get install, the searching path is PATH=/usr/sbin:/usr/bin:/sbin:/bin
But the pip/pip3 default path is at /usr/local/bin, so dpkg works, but apt-get not work.

How I did it
Export the path /usr/local/bin for pip/pip3.
Make the deb packages can be installed by apt-get.
2023-01-12 23:31:08 +00:00
mssonicbld
0e63a94fb6
[ci/build]: Upgrade SONiC package versions (#13249) 2023-01-04 19:49:04 +08:00
mssonicbld
352dd7ea7f
[ci/build]: Upgrade SONiC package versions (#13188) 2022-12-28 19:56:49 +08:00
mssonicbld
be918d5332
[ci/build]: Upgrade SONiC package versions (#13168) 2022-12-25 20:46:59 +08:00
mssonicbld
f4e005ad37
[ci/build]: Upgrade SONiC package versions (#13146) 2022-12-23 07:52:32 +08:00
mssonicbld
56d364994b
[ci/build]: Upgrade SONiC package versions (#13092) 2022-12-17 18:45:53 +08:00
mssonicbld
d9839a8bf2
[ci/build]: Upgrade SONiC package versions (#13014) 2022-12-10 18:55:14 +08:00