Commit Graph

5791 Commits

Author SHA1 Message Date
Vaibhav Hemant Dixit
6e705dddb0 Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
2023-06-02 18:25:16 +00:00
jhli-cisco
0b6ee2de46
Update cisco-8000.ini (#15300)
Fix for SR 695336631. x86 FPGA FPD needs to be upgraded to ver 1.88
2023-06-02 08:23:14 -07:00
Ye Jianquan
897caef621
[CI/CD] Refine pr test definition, remove old test jobs and testbedv2 flags (#15303) 2023-06-02 16:33:34 +08:00
Ye Jianquan
934dc224d0
[CI/CD] Migrate to SONiC Elastictest (#15275) 2023-06-02 10:39:00 +08:00
Ye Jianquan
04f921c52d
[CI/CD] Refine PR test templates and test_plan.py to be ready to migrate to Elastictest (#15257) 2023-05-31 06:36:14 +00:00
James An
83b226d8e9
Update cisco-8000.ini (#15200)
Why I did it

Release Notes for Cisco 8102-64H:
Updated mtd-utils.mk and pyudev.mk for addressing build failures

How I did it
Update platform version to 202012.3.0.1
2023-05-26 10:53:14 -07:00
Liu Shilong
57ea1e89f9
[ci] Enable kvm test when upgrading package versions. (#15018)
Why I did it
Run kvmtest when update package versions to avoid test break.

Work item tracking
Microsoft ADO (number only): 22335854
How I did it
How to verify it
2023-05-25 17:31:16 +08:00
Liu Shilong
87e1a0a645
Fix error handling when failing to install a deb package (#11846) (#15087)
#### Why I did it
Fix endless build log issue.
Cherry pick [PR#11846](https://github.com/sonic-net/sonic-buildimage/pull/11846)
##### Work item tracking
- Microsoft ADO **(number only)**: 19299131
#### How I did it
The current error handling code for when a deb package fails to be installed currently has a chain of commands linked together by && and ends with `exit 1`. The assumption is that the commands would succeed, and the last `exit 1` would end it with a non-zero return code, thus fully failing the target and causing the build to stop because of bash's -e flag.

However, if one of the commands prior to `exit 1` returns a non-zero return code, then bash won't actually treat it as a terminating error. From bash's man page:

```
-e      Exit immediately if a pipeline (which may consist of a single simple
	command), a list, or a compound command (see SHELL GRAMMAR above),
        exits with a non-zero status.  The shell does not exit if the
        command that fails is part of the  command  list  immediately
        following a while or until keyword, part of the test following the
        if or elif reserved words, part of any command executed in a && or
        || list except the command following the final && or ||, any
        command in a pipeline but the last, or if the command's return
        value is being inverted with !.  If a compound command other than a
        subshell returns a non-zero status because a command failed while
        -e was being ignored, the shell does not exit.
```

The part `part of any command executed in a && or || list except the command following the final && or ||` says that if the failing command is not the `exit 1` that we have at the end, then bash doesn't treat it as an error and exit immediately. Additionally, since this is a compound command, but isn't in a subshell (subshell are marked by `(` and `)`, whereas `{` and `}` just tells bash to run the commands in the current environment), bash doesn't exist. The result of this is that in the deb-install target, if a package installation fails, it may be infinitely stuck in that while-loop.

There are two fixes for this: change to using a subshell, or use `;` instead of `&&`. Using a subshell would, I think, require exporting any shell variables used in the subshell, so I chose to change the `&&` to `;`. In addition, at the start of the subshell, `set +e` is added in, which removes the exit-on-error handling of bash. This makes sure that all commands are run (the output of which may help for debugging) and that it still exits with 1, which will then fully fail the target.
#### How to verify it
2023-05-25 00:04:02 -07:00
kellyyeh
7abddb42b0
Advance sonic-utilities submodule (#15009)
Why I did it
Advance sonic-utilities submodule head
Added below commits:
878be48e kellyyeh Wed May 10 15:21:52 2023 -0700 Revert "[warm-reboot] Use kexec_file_load instead of kexec_load when available
094513f8 Vaibhav Hemant Dixit Tue May 9 13:03:52 2023 -0700 [202012] LAG keepalive script to reduce lacp session wait during warm-reboot

Work item tracking
Microsoft ADO (number only): 23687678
2023-05-24 16:09:38 -07:00
Ye Jianquan
52e33258dc
Refine testbedv2 template output (#14460) 2023-05-24 10:27:14 +08:00
Ye Jianquan
978db8e9ba
Refine test job definition and assert logic (#14959)
Why I did it
Remove 'kvmtest-t0' and 'kvmtest-t1-lag' test jobs since all the test jobs are required (continueOnError: false) already, and will only enable one of classical and testbedV2 tests, no need to do an unnecessary 'or' compute test job.
Change agent pool to reduce cost and avoid congestion
2023-05-24 10:27:04 +08:00
Liu Shilong
59b89ee7c3
[ci] Enable reproducible build for arm64 and armhf. (#15190)
Fix armhf and arm64 build issue.
Revert #7517. Because armhf and arm64 agents are ready.

Microsoft ADO (number only): 23765181
2023-05-23 10:27:43 -07:00
siqbal1986
750263401b
[202012] submodule update sonic-swss. (#15023)
Updated for commit:

f141880 - 2023-04-26 : [bugfix] vnet ping missing with secondary endpoints empty in priority routes. (#2736) (#2747) [siqbal1986]
2023-05-18 15:23:15 -07:00
Cédric Ollivier
dc92a9f906 [build]: Force xz as compression type when building sonic-build-hooks debs (#12823)
Ubuntu 22.04 leverages Zstandard compression to dpkg by default.
Debian doesn't support it yet
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664

Fix #12822

Signed-off-by: Cédric Ollivier <cedric.ollivier@orange.com>
2023-05-18 14:33:29 +08:00
Anish Narsian
d14a094f0a
Resolve neighbors from config_db (#14990)
* To resolve NEIGH table entries present in CONFIG_DB. Without this change arp/ndp entries which we wish to resolve, and configured via CONFIG_DB are not resolved.
2023-05-17 16:13:54 -07:00
mssonicbld
1e0412a3bb
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#14862)
#### Why I did it
src/linkmgrd
```
* 35f7d1c - (HEAD -> 202012, origin/202012) [202012][active-standby][bsl] fix no mux probe issue #201 (#206) (2 weeks ago) [Jing Zhang]
* be701fb - [202012] pick codeql fixes (#207) (2 weeks ago) [Jing Zhang]
* 0b42cec - Enable debug symbols (#199) (3 weeks ago) [Longxiang Lyu]
```
2023-05-17 10:59:28 -07:00
mssonicbld
24b0ed30a5
[submodule] Update submodule sonic-telemetry to the latest HEAD automatically (#14944)
#### Why I did it
src/sonic-telemetry
```
* 6dc4bb9 - (HEAD -> 202012, origin/202012) Merge pull request #107 from zbud-msft/backport-202012 (33 hours ago) [Tomek Madejski]
* cbb7b1e - Update azp yml to generalize branch names (#106) (7 days ago) [Zain Budhwani]
```
2023-05-17 00:33:26 -07:00
mssonicbld
4b90929918
[submodule] Update submodule sonic-py-swsssdk to the latest HEAD automatically (#15035)
Why I did it
src/sonic-py-swsssdk

* d44e0d8 - (HEAD -> 202012, origin/202012) [Security] Fix the redis security issue CVE-2023-28858 and  CVE-2023-28859 (#135) (3 days ago) [xumia]
2023-05-15 00:39:12 -07:00
Zain Budhwani
a55c8d7444
[202012] Update sonic-telemetry submodule head (#15048)
#### Why I did it

Update 202012 sonic-telemetry submodule head

##### Work item tracking
- Microsoft ADO **(number only)**:16208453
2023-05-12 16:57:38 -07:00
xumia
a6644b2b99
[Build] Upgrade the python docker version (#15031)
#### Why I did it
[Build] Upgrade the python docker version to fix bgp not up issue

##### Work item tracking
- Microsoft ADO **(number only)**: 22236397
2023-05-12 11:37:00 -07:00
Jon Goldberg
2b21cd5e22 [armhf][Nokia-7215] changes fstrim.timer to daily (#14723)
Using timer-override.conf, we modify the fstrim.timer service.

For armhf, Nokia-7215 platform, we modify fstrim.timer to run daily
instead of weekly.  This is required because the size of the SSD on
this platform is 16GB, which on average is nearly 10 times smaller than
most other sonic platforms.  With smaller disk and the ever increasing
level of logging done by sonic, this change is required to prevent
the SSD from entering a read-only state due to inadequate free blocks.
2023-05-11 16:32:30 +08:00
mssonicbld
e34c17813f [ci/build]: Upgrade SONiC package versions 2023-05-10 20:50:40 +08:00
Dev Ojha
5a6735c004
[submodule] sonic-utilities submodule update (#14937)
#### Why I did it
sonic-utilities submodule update for 202012

```
* d20fc3c8 2023-04-07 | [202012][DBMigrator] Update db_migrator to support EdgeZoneAggregator Buffer Config for T0s (#2768) (HEAD, origin/202012) [Dev Ojha]
* 322a74dd 2023-03-27 | Resolved rc!=0 problem by replacing fgrep with awk. Added ipv4 filtering to get only v4 peers in case of show ip bgp neighbors (#2743) [saurabhab]
```

##### Work item tracking
- Microsoft ADO **(number only)**: 20782336
2023-05-08 11:56:23 -07:00
mssonicbld
01a9c13af0
[ci/build]: Upgrade SONiC package versions (#14975) 2023-05-07 19:54:08 +08:00
mssonicbld
894a919733
[ci/build]: Upgrade SONiC package versions (#14973) 2023-05-06 21:40:58 +08:00
mssonicbld
99a8ad7d0d
[ci/build]: Upgrade SONiC package versions (#14893) 2023-04-30 18:46:57 +08:00
mssonicbld
4a74a02be9 [ci/build]: Upgrade SONiC package versions 2023-04-29 18:32:32 +08:00
Samuel Angebault
8c740555ae [Arista] Disable SSD NCQ on Lodoga (#13964)
Why I did it
Fix similar issue seen on #13739 but only for DCS-7050CX3-32S

How I did it
Add a kernel parameter to tell libata to disable NCQ

How to verify it
The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg.

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ

   READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec
  WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec
without NCQ

   READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec
  WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec
2023-04-27 12:33:38 +08:00
Liu Shilong
a46c615260
[ci] Remove innovium in upgrate version pipeline. (#14842)
Why I did it
Innovium platform has build issue.
Remove it from upgrade version pipeline.

Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
2023-04-26 17:46:45 +08:00
Hua Liu
1f3da955b9
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. (#14402) (#14755)
[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. 

This is cherry-pick PR for: https://github.com/sonic-net/sonic-buildimage/pull/14402

#### Why I did it
On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.

However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.

To avoid the false alert, improve the monitor to wait and re-check.

Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'

##### Work item tracking
- Microsoft ADO :17424426

#### How I did it
Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly:


```
admin@***:~$ sudo systemctl stop  serial-getty@ttyS1.service
admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status serial-getty@ttyS1.serviceserial-getty@ttyS1.service - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago

admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status serial-getty@ttyS1.serviceserial-getty@ttyS1.service - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)
```

syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```

#### Tested branch (Please provide the tested image version)

- [x] 20201231.77

#### Description for the changelog
[S6100] Improve S6100 serial-getty monitor.
2023-04-20 23:10:01 -07:00
xumia
ae0a47dc6e
[Build][202012] Support Debian snapshot mirror to improve build stability (#14558)
#### Why I did it
Cherry-pick commits from master to support the snapshot based mirror, and fix the code conflicts. And add the last commit to fix the build broken issue according to the mirror change.

ad162ae0e [Build] Optimize the version control for Debian packages (https://github.com/sonic-net/sonic-buildimage/pull/14557)
38c5d7fce [Build] Support j2 template for debian sources for docker ptf (https://github.com/sonic-net/sonic-buildimage/pull/13198)
5e4826ebf  [Ci] Support to use the same snapshot for all platform builds (#13913)
820692563 [Build] Change the default mirror version config file (#13786)
5e4a866e3 [Build] Support Debian snapshot mirror to improve build stability (#13097)
ac5d89c6a  [Build] Support j2 template for debian sources (#12557)
2023-04-20 22:45:33 -07:00
Feng-msft
7c4b8bc813 Update golang version for telemetry build in sonic-slave-buster to fix (#14636)
Update golang version for telemetry build in sonic-slave-jessie to fix CVE-2021-33195, this PR will be merged into 201911 branch finally.

#### Why I did it
Go before 1.15.13 and 1.16.x before 1.16.5 has functions for DNS lookups that do not validate replies from DNS servers, and thus a return value may contain an unsafe injection (e.g., XSS) that does not conform to the RFC1035 format. Now in 201911 and 202012 branch we're using 1.14.2

##### Work item tracking
- Microsoft ADO **(number only)**:17727291

#### How I did it
Bump golang version into 1.15.15 which contains corresponding fix.

#### How to verify it
unit test to do sanity check.
2023-04-20 16:34:15 +08:00
xumia
69951f368b [Ci] Fix the wrong SONIC_BUILD_JOBS build variable used issue in Azp (#14071)
Why I did it
[Ci] Fix the no parallel jobs in some of the platforms issue
We observed some of the pipelines running more time than expected. The issue is the SONIC_BUILD_JOBS using the wrong value 1. It is caused by the runtime variable issue, there is additional single quota mark character added in the make command line.

make 'SONIC_BUILD_JOBS=$(nproc)' targe/xxxx
Need to change to

make SONIC_BUILD_JOBS=$(nproc) targe/xxxx
It is to improve the build performance for some of the platforms using the variable SONIC_BUILD_JOBS=1.
Good one vs: https://dev.azure.com/mssonic/build/_build/results?buildId=227986&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795

"SONIC_BUILD_JOBS"                : "8"
Bad one barefoot: https://dev.azure.com/mssonic/build/_build/results?buildId=227379&view=logs&j=993d6e22-aeec-5c03-fa19-35ecba587dd9&t=7be0d2ec-661f-5569-462c-2d9b7ca4ca5d

"SONIC_BUILD_JOBS"                : "1"
How I did it
Expand the BUILD_OPTIONS variable for all platforms.
2023-04-20 14:34:43 +08:00
Jing Zhang
c45c109d75
update submodule (#14690)
[sonic-linkmgrd][202012] update submodule

0179207 (HEAD -> 202012, origin/202012) [202012][active-standby] Enforce switchover based on heartbeats when mux probe keeps failing #184 (#197)

sign-off: Jing Zhang zhangjing@microsoft.com
2023-04-19 11:43:56 -07:00
mssonicbld
a595a02d68
[ci/build]: Upgrade SONiC package versions (#14719) 2023-04-19 22:35:17 +08:00
mssonicbld
19b212c6a0
[ci/build]: Upgrade SONiC package versions (#14679) 2023-04-16 21:12:13 +08:00
mssonicbld
fcf2ae78de
[ci/build]: Upgrade SONiC package versions (#14671) 2023-04-15 20:34:05 +08:00
Liu Shilong
0c3e395ace
[build] Check if patches are applied before applying patches. (#13566) (#14662)
Why I did it
If make fails, we can't rerun the make process, because existing patches can't apply again.
#13386 missed some change.

Work item tracking
Microsoft ADO (number only):
How I did it
Check if patches are applied. if yes, don't apply patches again.

How to verify it
2023-04-14 12:16:05 +00:00
xumia
7b302d4002
[Submodule][202012] Advance sonic-restapi pointer (#14627)
Why I did it
[Submodule][202012] Advance sonic-restapi pointer

4f6f979 [Security] Fix the redis security issue CVE-2023-28858 and CVE-2023-28859 (#139)

Work item tracking
Microsoft ADO (number only): 17894593
How I did it
How to verify it
2023-04-13 15:25:46 +08:00
mssonicbld
65a2a970d8
[ci/build]: Upgrade SONiC package versions (#14622) 2023-04-12 21:39:43 +08:00
mssonicbld
6bef84bf39
[ci/build]: Upgrade SONiC package versions (#14607) 2023-04-12 00:38:39 +08:00
Dev Ojha
8a4f42d883
[202012][Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14539)
#### Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

Original PR for master: #14280 

#### How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router. 

#### How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect. 

#### Description for the changelog
[Buffer] Added cable length config to buffer config template for EdgeZoneAggregator
2023-04-10 11:58:34 -07:00
mssonicbld
916633cf1d
[ci/build]: Upgrade SONiC package versions (#14570) 2023-04-08 20:20:43 +08:00
Prince Sunny
b4c0309716
[Submodule] Update sonic-swss (#14567)
Update swss commits:
c161027 - 2023-04-07 : [202012] overlay_dmac change in Vnet configuration. (#2724) [siqbal1986]
50be4e3 - 2023-04-05 : [202012][mux]: Implement rollback for failed mux switchovers (#2716) [Lawrence Lee]
637e4c7 - 2023-03-30 : [202012] Fix orchagent missing request when logrotate happens (#2718) [Prince Sunny]
2023-04-07 17:15:53 -07:00
mssonicbld
bb2cec56f0 [ci/build]: Upgrade SONiC package versions 2023-04-07 09:40:28 +08:00
mssonicbld
df34b8ea50
[ci/build]: Upgrade SONiC package versions (#14527) 2023-04-05 21:02:20 +08:00
Jing Zhang
99c724434e
[202012][sonic-linkmgrd] submodule update (#14480)
Include commit: 
```
6ea1f03 Jing Zhang      Tue Mar 28 08:42:44 2023 -0700  [202012] remove chatty log message for peer link event (#192)
198292d Jing Zhang      Tue Mar 21 17:53:11 2023 -0700  [active-standby] avoid unnecessary mux state probe after configuring to `auto` (#183)
47de88e Jing Zhang      Mon Mar 20 18:14:25 2023 -0700  [202012] Avoid unnecessary error logs from `handleGetServerMacAddressNotification` #96 (#185)
8a33319 Jing Zhang      Mon Mar 6 11:53:27 2023 -0800   loose link down swithcover condition (#178)
c2bf08d Jing Zhang      Thu Mar 16 18:59:10 2023 -0700  fix ActiveStandbyStateMachine referrence (#186)
99d26af Jing Zhang      Thu Mar 16 18:58:48 2023 -0700  [ci] Fix apt-get install unable locate package issue. (#177) (#187)
d893be9 Longxiang Lyu   Wed Feb 22 12:55:44 2023 +0800  [active-standby] Toggle to standby if link down and config auto (#173)
```
2023-04-04 10:40:16 -07:00
jhli-cisco
fc0cca2fb6
[cisco-8000] update platform module to 0.2.7 (#14172)
#### Why I did it
Fix for link down issue seen with AOI 100G-PSM4 optics on 8102-64H-O [JIRA ID# MIGSMSFT-23]

#### How I did it
update platform module to 0.2.7
2023-04-03 20:36:11 -07:00
jcaiMR
9c5138b60e
change static route expiry time from 1800 to 172800 (#14497)
* [Bgpcfgd] change static route expiry time from 1800 to 172800
2023-04-03 11:42:28 -07:00
Liu Shilong
5db6b6131c Pin mmh3 package version in sonic-slave-stretch docker (#14463)
Why I did it
mmh3's new version 3.1.0 breaks pipeline build.
bullseye/buster/jessie pined the version to 2.5.1

How I did it
Pin mmh3's version as other dists.

How to verify it
2023-04-03 16:34:04 +08:00